Score: 0

Provably Extracting the Features from a General Superposition

Published: December 17, 2025 | arXiv ID: 2512.15987v1

By: Allen Liu

It is widely believed that complex machine learning models generally encode features through linear representations, but these features exist in superposition, making them challenging to recover. We study the following fundamental setting for learning features in superposition from black-box query access: we are given query access to a function \[ f(x)=\sum_{i=1}^n a_i\,σ_i(v_i^\top x), \] where each unit vector $v_i$ encodes a feature direction and $σ_i:\mathbb{R} \rightarrow \mathbb{R}$ is an arbitrary response function and our goal is to recover the $v_i$ and the function $f$. In learning-theoretic terms, superposition refers to the overcomplete regime, when the number of features is larger than the underlying dimension (i.e. $n > d$), which has proven especially challenging for typical algorithmic approaches. Our main result is an efficient query algorithm that, from noisy oracle access to $f$, identifies all feature directions whose responses are non-degenerate and reconstructs the function $f$. Crucially, our algorithm works in a significantly more general setting than all related prior results -- we allow for essentially arbitrary superpositions, only requiring that $v_i, v_j$ are not nearly identical for $i \neq j$, and general response functions $σ_i$. At a high level, our algorithm introduces an approach for searching in Fourier space by iteratively refining the search space to locate the hidden directions $v_i$.

Superposition as Lossy Compression: Measure with Sparse Autoencoders and Connect to Adversarial Vulnerability

Machine Learning (CS)

Measures how many ideas a computer brain can hold.

15 Dec 2025 0

87%

Adversarial Attacks Leverage Interference Between Features in Superposition

Machine Learning (CS)

Makes AI easier to trick by how it learns.

13 Oct 2025 0

86%

Superposition disentanglement of neural representations reveals hidden alignment

Machine Learning (CS)

Helps computers understand brain signals better.

3 Oct 2025 0

View PDF Login to Bookmark

Provably Extracting the Features from a General Superposition

Technical Abstract

Superposition as Lossy Compression: Measure with Sparse Autoencoders and Connect to Adversarial Vulnerability

Adversarial Attacks Leverage Interference Between Features in Superposition

Superposition disentanglement of neural representations reveals hidden alignment