What do language models model? Transformers, automata, and the format of thought
By: Colin Klein
Potential Business Impact:
Computers learn language like a machine, not a brain.
What do large language models actually model? Do they tell us something about human capacities, or are they models of the corpus we've trained them on? I give a non-deflationary defence of the latter position. Cognitive science tells us that linguistic capabilities in humans rely supralinear formats for computation. The transformer architecture, by contrast, supports at best a linear formats for processing. This argument will rely primarily on certain invariants of the computational architecture of transformers. I then suggest a positive story about what transformers are doing, focusing on Liu et al. (2022)'s intriguing speculations about shortcut automata. I conclude with why I don't think this is a terribly deflationary story. Language is not (just) a means for expressing inner state but also a kind of 'discourse machine' that lets us make new language given appropriate context. We have learned to use this technology in one way; LLMs have also learned to use it too, but via very different means.
Similar Papers
The Role of Logic and Automata in Understanding Transformers
Formal Languages and Automata Theory
Makes computers understand language and solve problems.
Large language models are not about language
Computation and Language
Teaches computers how minds learn language.
Three tiers of computation in transformers and in brain architectures
Computation and Language
Teaches computers to think and reason better.