BlossomRec: Block-level Fused Sparse Attention Mechanism for Sequential Recommendations
By: Mengyang Ma , Xiaopeng Li , Wanyu Wang and more
Transformer structures have been widely used in sequential recommender systems (SRS). However, as user interaction histories increase, computational time and memory requirements also grow. This is mainly caused by the standard attention mechanism. Although there exist many methods employing efficient attention and SSM-based models, these approaches struggle to effectively model long sequences and may exhibit unstable performance on short sequences. To address these challenges, we design a sparse attention mechanism, BlossomRec, which models both long-term and short-term user interests through attention computation to achieve stable performance across sequences of varying lengths. Specifically, we categorize user interests in recommendation systems into long-term and short-term interests, and compute them using two distinct sparse attention patterns, with the results combined through a learnable gated output. Theoretically, it significantly reduces the number of interactions participating in attention computation. Extensive experiments on four public datasets demonstrate that BlossomRec, when integrated with state-of-the-art Transformer-based models, achieves comparable or even superior performance while significantly reducing memory usage, providing strong evidence of BlossomRec's efficiency and effectiveness.The code is available at https://github.com/ronineume/BlossomRec.
Similar Papers
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
Computation and Language
Makes AI understand much longer stories.
Why Generate When You Can Transform? Unleashing Generative Attention for Dynamic Recommendation
Information Retrieval
Helps apps guess what you'll like next.
Understanding and Enhancing Mamba-Transformer Hybrids for Memory Recall and Language Modeling
Computation and Language
Makes AI understand long stories better.