Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning
By: Jikai Jin , Vasilis Syrgkanis , Sham Kakade and more
Potential Business Impact:
Finds how AI learns and improves.
Faithful evaluation of language model capabilities is crucial for deriving actionable insights that can inform model development. However, rigorous causal evaluations in this domain face significant methodological challenges, including complex confounding effects and prohibitive computational costs associated with extensive retraining. To tackle these challenges, we propose a causal representation learning framework wherein observed benchmark performance is modeled as a linear transformation of a few latent capability factors. Crucially, these latent factors are identified as causally interrelated after appropriately controlling for the base model as a common confounder. Applying this approach to a comprehensive dataset encompassing over 1500 models evaluated across six benchmarks from the Open LLM Leaderboard, we identify a concise three-node linear causal structure that reliably explains the observed performance variations. Further interpretation of this causal structure provides substantial scientific insights beyond simple numerical rankings: specifically, we reveal a clear causal direction starting from general problem-solving capabilities, advancing through instruction-following proficiency, and culminating in mathematical reasoning ability. Our results underscore the essential role of carefully controlling base model variations during evaluation, a step critical to accurately uncovering the underlying causal relationships among latent model capabilities.
Similar Papers
Towards Interpretable Deep Generative Models via Causal Representation Learning
Machine Learning (Stat)
Makes AI understand how things cause each other.
Realizing LLMs' Causal Potential Requires Science-Grounded, Novel Benchmarks
Machine Learning (CS)
Helps AI understand cause and effect better.
A Survey on Enhancing Causal Reasoning Ability of Large Language Models
Computation and Language
Teaches computers to understand cause and effect.