Measuring Affinity between Attention-Head Weight Subspaces via the Projection Kernel
By: Hiroaki Yamagiwa, Yusuke Takase, Hidetoshi Shimodaira
Understanding relationships between attention heads is essential for interpreting the internal structure of Transformers, yet existing metrics do not capture this structure well. We focus on the subspaces spanned by attention-head weight matrices and quantify head-to-head relationships using the Projection Kernel (PK), a principal-angle-based measure of subspace similarity. Experiments show that PK reproduces known head-to-head interactions on the IOI task more clearly than prior metrics such as the Composition Score. We further introduce a framework to quantify the informativeness of PK distributions by comparing them with a reference distribution derived from random orthogonal subspaces. As an application, we analyze a directed graph constructed from PK and show that, in GPT2-small, L4H7 acts as a hub by functioning as an identity head.
Similar Papers
Vector Arithmetic in Concept and Token Subspaces
Computation and Language
Makes AI understand word meanings and spelling better.
Inductive Bias and Spectral Properties of Single-Head Attention in High Dimensions
Machine Learning (Stat)
Helps AI learn better by understanding how it works.
A Reproduction Study: The Kernel PCA Interpretation of Self-Attention Fails Under Scrutiny
Machine Learning (CS)
Shows self-attention doesn't work like a math trick.