When Models Manipulate Manifolds: The Geometry of a Counting Task
By: Wes Gurnee , Emmanuel Ameisen , Isaac Kauvar and more
Potential Business Impact:
Computer "sees" text lines by counting characters.
Language models can perceive visual properties of text despite receiving only sequences of tokens-we mechanistically investigate how Claude 3.5 Haiku accomplishes one such task: linebreaking in fixed-width text. We find that character counts are represented on low-dimensional curved manifolds discretized by sparse feature families, analogous to biological place cells. Accurate predictions emerge from a sequence of geometric transformations: token lengths are accumulated into character count manifolds, attention heads twist these manifolds to estimate distance to the line boundary, and the decision to break the line is enabled by arranging estimates orthogonally to create a linear decision boundary. We validate our findings through causal interventions and discover visual illusions--character sequences that hijack the counting mechanism. Our work demonstrates the rich sensory processing of early layers, the intricacy of attention algorithms, and the importance of combining feature-based and geometric views of interpretability.
Similar Papers
Emergent Riemannian geometry over learning discrete computations on continuous manifolds
Machine Learning (CS)
Helps computers learn to make decisions from pictures.
The Origins of Representation Manifolds in Large Language Models
Machine Learning (CS)
Helps AI understand ideas by seeing how they connect.
The Geometry of Abstraction: Continual Learning via Recursive Quotienting
Machine Learning (CS)
Helps computers remember everything without forgetting.