Language Models Are Implicitly Continuous
By: Samuele Marro , Davide Evangelista , X. Angelo Huang and more
Potential Business Impact:
Computers see sentences as smooth, flowing ideas.
Language is typically modelled with discrete sequences. However, the most successful approaches to language modelling, namely neural networks, are continuous and smooth function approximators. In this work, we show that Transformer-based language models implicitly learn to represent sentences as continuous-time functions defined over a continuous input space. This phenomenon occurs in most state-of-the-art Large Language Models (LLMs), including Llama2, Llama3, Phi3, Gemma, Gemma2, and Mistral, and suggests that LLMs reason about language in ways that fundamentally differ from humans. Our work formally extends Transformers to capture the nuances of time and space continuity in both input and output space. Our results challenge the traditional interpretation of how LLMs understand language, with several linguistic and engineering implications.
Similar Papers
Let's Predict Sentence by Sentence
Computation and Language
Computers learn to think in ideas, not just words.
Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows
Machine Learning (CS)
Lets computers understand words better, faster, and more flexibly.
A Mathematical Explanation of Transformers for Large Language Models and GPTs
Machine Learning (CS)
Explains how AI learns by seeing patterns.