Fixed-Point RNNs: Interpolating from Diagonal to Dense
By: Sajad Movahedi , Felix Sarnthein , Nicola Muca Cirone and more
Potential Business Impact:
Makes AI learn faster and remember more.
Linear recurrent neural networks (RNNs) and state-space models (SSMs) such as Mamba have become promising alternatives to softmax-attention as sequence mixing layers in Transformer architectures. Current models, however, do not exhibit the full state-tracking expressivity of RNNs because they rely on channel-wise (i.e. diagonal) sequence mixing. In this paper, we investigate parameterizations of a large class of dense linear RNNs as fixed-points of parallelizable diagonal linear RNNs. The resulting models can naturally trade expressivity for efficiency at a fixed number of parameters and achieve state-of-the-art results on the commonly used toy tasks $A_5$, $S_5$, copying, and modular arithmetics.
Similar Papers
Parity Requires Unified Input Dependence and Negative Eigenvalues in SSMs
Machine Learning (CS)
Makes computers remember past information better.
Block-Biased Mamba for Long-Range Sequence Processing
Machine Learning (CS)
Makes AI better at remembering long stories.
From S4 to Mamba: A Comprehensive Survey on Structured State Space Models
Machine Learning (CS)
Makes computers understand long stories faster.