MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling
By: Mahdi Karami , Ali Behrouz , Peilin Zhong and more
State-space models (SSMs) have recently attention as an efficient alternative to computationally expensive attention-based models for sequence modeling. They rely on linear recurrences to integrate information over time, enabling fast inference, parallelizable training, and control over recurrence stability. However, traditional SSMs often suffer from limited effective memory, requiring larger state sizes for improved recall. Moreover, existing SSMs struggle to capture multi-scale dependencies, which are essential for modeling complex structures in time series, images, and natural language. This paper introduces a multi-scale SSM framework that addresses these limitations by representing sequence dynamics across multiple resolution and processing each resolution with specialized state-space dynamics. By capturing both fine-grained, high-frequency patterns and coarse, global trends, MS-SSM enhances memory efficiency and long-range modeling. We further introduce an input-dependent scale-mixer, enabling dynamic information fusion across resolutions. The proposed approach significantly improves sequence modeling, particularly in long-range and hierarchical tasks, while maintaining computational efficiency. Extensive experiments on benchmarks, including Long Range Arena, hierarchical reasoning, time series classification, and image recognition, demonstrate that MS-SSM consistently outperforms prior SSM-based models, highlighting the benefits of multi-resolution processing in state-space architectures.
Similar Papers
The Curious Case of In-Training Compression of State Space Models
Machine Learning (CS)
Shrinks computer models during learning for speed.
From S4 to Mamba: A Comprehensive Survey on Structured State Space Models
Machine Learning (CS)
Makes computers understand long stories faster.
How Many Heads Make an SSM? A Unified Framework for Attention and State Space Models
Machine Learning (CS)
Helps computers understand long sentences better.