Parity Requires Unified Input Dependence and Negative Eigenvalues in SSMs
By: Behnoush Khavari , Mehran Shakerinava , Jayesh Khullar and more
Potential Business Impact:
Makes computers remember past information better.
Recent work has shown that LRNN models such as S4D, Mamba, and DeltaNet lack state-tracking capability due to either time-invariant transition matrices or restricted eigenvalue ranges. To address this, input-dependent transition matrices, particularly those that are complex or non-triangular, have been proposed to enhance SSM performance on such tasks. While existing theorems demonstrate that both input-independent and non-negative SSMs are incapable of solving simple state-tracking tasks, such as parity, regardless of depth, they do not explore whether combining these two types in a multilayer SSM could help. We investigate this question for efficient SSMs with diagonal transition matrices and show that such combinations still fail to solve parity. This implies that a recurrence layer must both be input-dependent and include negative eigenvalues. Our experiments support this conclusion by analyzing an SSM model that combines S4D and Mamba layers.
Similar Papers
The Curious Case of In-Training Compression of State Space Models
Machine Learning (CS)
Shrinks computer models during learning for speed.
Fixed-Point RNNs: Interpolating from Diagonal to Dense
Machine Learning (CS)
Makes AI learn faster and remember more.
Rethinking the long-range dependency in Mamba/SSM and transformer models
Machine Learning (CS)
Makes computers remember longer, like brains do.