Project summary
Building a new type of language model based on discrete rules (Prolog-like) as opposed to continuous neurons.
What are this project's goals? How will you achieve them?
First, improving language model interpretability.
It can be a significant challenge to understand what exactly is going in a modern Transformer after token embedding. The embedding travels along the residual stream for many layers, and with each layer the stream is transformed based on the all previous token's previous stream at that layer. The transformations and weights are often 16 bit values or more.
Discrete rules may be uniquely interpretable.
Also, improved performance, training efficiency, and continual learning capabilities.
Many intelligent processes can be broken down into discrete steps. For example: when you see the characters " c a t ", you can recognize the word "cat". This requires no continuous process. Important tasks such as arithmetic and traversing a knowledge graph are also discrete, and current LLMs often fail to learn robust and generalizable solutions for them.
Discrete rules may be more performant and better for representation learning in language modeling.
Goals are to be accomplished by thorough study and analysis of discrete language models.
How will this funding be used?
Funding will be used to support myself and for compute.
Who is on your team? What's your track record on similar projects?
Just me. I did research at university, applying machine learning to improve classification of electrical signal outputs from medical sensors. I was initial member at a data observability startup, researching core ML algorithms behind the platform (went on to be YC funded).
What are the most likely causes and outcomes if this project fails?
Transformers could be more cost efficient, given current accelerator architecture optimization for dense floating point matrix operations. If the discrete language model shows benefits, GPU and ASIC codesign could change the economics down the road.
How much money have you raised in the last 12 months, and from where?
None.