Score: 0

Superposition disentanglement of neural representations reveals hidden alignment

Published: October 3, 2025 | arXiv ID: 2510.03186v1

By: André Longon, David Klindt, Meenakshi Khosla

Potential Business Impact:

Helps computers understand brain signals better.

Business Areas:

Semantic Search Internet Services

The superposition hypothesis states that a single neuron within a population may participate in the representation of multiple features in order for the population to represent more features than the number of neurons. In neuroscience and AI, representational alignment metrics measure the extent to which different deep neural networks (DNNs) or brains represent similar information. In this work, we explore a critical question: \textit{does superposition interact with alignment metrics in any undesirable way?} We hypothesize that models which represent the same features in \textit{different superposition arrangements}, i.e., their neurons have different linear combinations of the features, will interfere with predictive mapping metrics (semi-matching, soft-matching, linear regression), producing lower alignment than expected. We first develop a theory for how the strict permutation metrics are dependent on superposition arrangements. This is tested by training sparse autoencoders (SAEs) to disentangle superposition in toy models, where alignment scores are shown to typically increase when a model's base neurons are replaced with its sparse overcomplete latent codes. We find similar increases for DNN\(\rightarrow\)DNN and DNN\(\rightarrow\)brain linear regression alignment in the visual domain. Our results suggest that superposition disentanglement is necessary for mapping metrics to uncover the true representational alignment between neural codes.

Superposition as Lossy Compression: Measure with Sparse Autoencoders and Connect to Adversarial Vulnerability

Machine Learning (CS)

Measures how many ideas a computer brain can hold.

15 Dec 2025 0

88%

From superposition to sparse codes: interpretable representations in neural networks

Machine Learning (CS)

Helps computers understand what they see like humans.

3 Mar 2025 0

87%

Adversarial Attacks Leverage Interference Between Features in Superposition

Machine Learning (CS)

Makes AI easier to trick by how it learns.

13 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

16 pages

Superposition disentanglement of neural representations reveals hidden alignment

Helps computers understand brain signals better.

Technical Abstract

Superposition as Lossy Compression: Measure with Sparse Autoencoders and Connect to Adversarial Vulnerability

From superposition to sparse codes: interpretable representations in neural networks

Adversarial Attacks Leverage Interference Between Features in Superposition