Score: 1

SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Published: August 7, 2025 | arXiv ID: 2508.05305v1

By: Nikita Dragunov , Temurbek Rahmatullaev , Elizaveta Goncharova and more

Potential Business Impact:

Makes AI write better by thinking in pictures.

The recently proposed Large Concept Model (LCM) generates text by predicting a sequence of sentence-level embeddings and training with either mean-squared error or diffusion objectives. We present SONAR-LLM, a decoder-only transformer that "thinks" in the same continuous SONAR embedding space, yet is supervised through token-level cross-entropy propagated via the frozen SONAR decoder. This hybrid objective retains the semantic abstraction of LCM while eliminating its diffusion sampler and restoring a likelihood-based training signal. Across model sizes from 39M to 1.3B parameters, SONAR-LLM attains competitive generation quality. We report scaling trends, ablations, benchmark results, and release the complete training code and all pretrained checkpoints to foster reproducibility and future research.

Repos / Data Links

Page Count
7 pages

Category
Computer Science:
Computation and Language