From Neurons to Semantics: Evaluating Cross-Linguistic Alignment Capabilities of Large Language Models via Neurons Alignment
By: Chongxuan Huang , Yongshi Ye , Biao Fu and more
Potential Business Impact:
Tests how well computers understand different languages.
Large language models (LLMs) have demonstrated remarkable multilingual capabilities, however, how to evaluate cross-lingual alignment remains underexplored. Existing alignment benchmarks primarily focus on sentence embeddings, but prior research has shown that neural models tend to induce a non-smooth representation space, which impact of semantic alignment evaluation on low-resource languages. Inspired by neuroscientific findings that similar information activates overlapping neuronal regions, we propose a novel Neuron State-Based Cross-Lingual Alignment (NeuronXA) to assess the cross-lingual a lignment capabilities of LLMs, which offers a more semantically grounded approach to assess cross-lingual alignment. We evaluate NeuronXA on several prominent multilingual LLMs (LLaMA, Qwen, Mistral, GLM, and OLMo) across two transfer tasks and three multilingual benchmarks. The results demonstrate that with only 100 parallel sentence pairs, NeuronXA achieves a Pearson correlation of 0.9556 with downstream tasks performance and 0.8514 with transferability. These findings demonstrate NeuronXA's effectiveness in assessing both cross-lingual alignment and transferability, even with a small dataset. This highlights its potential to advance cross-lingual alignment research and to improve the semantic understanding of multilingual LLMs.
Similar Papers
How does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective
Computation and Language
Helps computers learn many languages better.
Can you map it to English? The Role of Cross-Lingual Alignment in Multilingual Performance of LLMs
Computation and Language
Helps computers understand many languages without extra training.
Language-specific Neurons Do Not Facilitate Cross-Lingual Transfer
Computation and Language
Helps computers understand less common languages better.