Features Emerge as Discrete States: The First Application of SAEs to 3D Representations
By: Albert Miao , Chenliang Zhou , Jiawei Zhou and more
Sparse Autoencoders (SAEs) are a powerful dictionary learning technique for decomposing neural network activations, translating the hidden state into human ideas with high semantic value despite no external intervention or guidance. However, this technique has rarely been applied outside of the textual domain, limiting theoretical explorations of feature decomposition. We present the \textbf{first application of SAEs to the 3D domain}, analyzing the features used by a state-of-the-art 3D reconstruction VAE applied to 53k 3D models from the Objaverse dataset. We observe that the network encodes discrete rather than continuous features, leading to our key finding: \textbf{such models approximate a discrete state space, driven by phase-like transitions from feature activations}. Through this state transition framework, we address three otherwise unintuitive behaviors -- the inclination of the reconstruction model towards positional encoding representations, the sigmoidal behavior of reconstruction loss from feature ablation, and the bimodality in the distribution of phase transition points. This final observation suggests the model \textbf{redistributes the interference caused by superposition to prioritize the saliency of different features}. Our work not only compiles and explains unexpected phenomena regarding feature decomposition, but also provides a framework to explain the model's feature learning dynamics. The code and dataset of encoded 3D objects will be available on release.
Similar Papers
Probing the Representational Power of Sparse Autoencoders in Vision Models
CV and Pattern Recognition
Makes AI understand pictures better and create new ones.
Probing the Representational Power of Sparse Autoencoders in Vision Models
CV and Pattern Recognition
Makes AI understand pictures better.
Group Equivariance Meets Mechanistic Interpretability: Equivariant Sparse Autoencoders
Machine Learning (CS)
Finds hidden patterns in data using math.