SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens
By: Nikita Dragunov , Temurbek Rahmatullaev , Elizaveta Goncharova and more
Potential Business Impact:
Makes AI write better by thinking in pictures.
The recently proposed Large Concept Model (LCM) generates text by predicting a sequence of sentence-level embeddings and training with either mean-squared error or diffusion objectives. We present SONAR-LLM, a decoder-only transformer that "thinks" in the same continuous SONAR embedding space, yet is supervised through token-level cross-entropy propagated via the frozen SONAR decoder. This hybrid objective retains the semantic abstraction of LCM while eliminating its diffusion sampler and restoring a likelihood-based training signal. Across model sizes from 39M to 1.3B parameters, SONAR-LLM attains competitive generation quality. We report scaling trends, ablations, benchmark results, and release the complete training code and all pretrained checkpoints to foster reproducibility and future research.
Similar Papers
Let's Predict Sentence by Sentence
Computation and Language
Computers learn to think in ideas, not just words.
Comprehend and Talk: Text to Speech Synthesis via Dual Language Modeling
Sound
Makes computer voices sound more natural and human.
Semantic-Enhanced Time-Series Forecasting via Large Language Models
Machine Learning (CS)
Helps computers predict future events better.