Deep Language Geometry: Constructing a Metric Space from LLM Weights
By: Maksym Shamrai, Vladyslav Hamolia
Potential Business Impact:
Maps languages by how computers understand them.
We introduce a novel framework that utilizes the internal weight activations of modern Large Language Models (LLMs) to construct a metric space of languages. Unlike traditional approaches based on hand-crafted linguistic features, our method automatically derives high-dimensional vector representations by computing weight importance scores via an adapted pruning algorithm. Our approach captures intrinsic language characteristics that reflect linguistic phenomena. We validate our approach across diverse datasets and multilingual LLMs, covering 106 languages. The results align well with established linguistic families while also revealing unexpected inter-language connections that may indicate historical contact or language evolution. The source code, computed language latent vectors, and visualization tool are made publicly available at https://github.com/mshamrai/deep-language-geometry.
Similar Papers
Visualizing LLM Latent Space Geometry Through Dimensionality Reduction
Machine Learning (CS)
Shows how computer language brains think and learn.
Large Language Models Enhanced Hyperbolic Space Recommender Systems
Information Retrieval
Finds better movies and songs for you.
Measuring Scalar Constructs in Social Science with LLMs
Computation and Language
Helps computers understand how complex or emotional writing is.