Modeling Transformers as complex networks to analyze learning dynamics
By: Elisabetta Rocchetti
Potential Business Impact:
Shows how computer brains learn new skills.
The process by which Large Language Models (LLMs) acquire complex capabilities during training remains a key open question in mechanistic interpretability. This project investigates whether these learning dynamics can be characterized through the lens of Complex Network Theory (CNT). I introduce a novel methodology to represent a Transformer-based LLM as a directed, weighted graph where nodes are the model's computational components (attention heads and MLPs) and edges represent causal influence, measured via an intervention-based ablation technique. By tracking the evolution of this component-graph across 143 training checkpoints of the Pythia-14M model on a canonical induction task, I analyze a suite of graph-theoretic metrics. The results reveal that the network's structure evolves through distinct phases of exploration, consolidation, and refinement. Specifically, I identify the emergence of a stable hierarchy of information spreader components and a dynamic set of information gatherer components, whose roles reconfigure at key learning junctures. This work demonstrates that a component-level network perspective offers a powerful macroscopic lens for visualizing and understanding the self-organizing principles that drive the formation of functional circuits in LLMs.
Similar Papers
A Mathematical Explanation of Transformers for Large Language Models and GPTs
Machine Learning (CS)
Explains how AI learns by seeing patterns.
Network Dynamics-Based Framework for Understanding Deep Neural Networks
Machine Learning (CS)
Explains how computer learning gets smarter.
Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits
Machine Learning (CS)
Finds hidden, separate jobs inside AI's brain.