FLASepformer: Efficient Speech Separation with Gated Focused Linear Attention Transformer
By: Haoxu Wang , Yiheng Jiang , Gang Qiao and more
Potential Business Impact:
Cleans up noisy audio faster and with less power.
Speech separation always faces the challenge of handling prolonged time sequences. Past methods try to reduce sequence lengths and use the Transformer to capture global information. However, due to the quadratic time complexity of the attention module, memory usage and inference time still increase significantly with longer segments. To tackle this, we introduce Focused Linear Attention and build FLASepformer with linear complexity for efficient speech separation. Inspired by SepReformer and TF-Locoformer, we have two variants: FLA-SepReformer and FLA-TFLocoformer. We also add a new Gated module to improve performance further. Experimental results on various datasets show that FLASepformer matches state-of-the-art performance with less memory consumption and faster inference. FLA-SepReformer-T/B/L increases speed by 2.29x, 1.91x, and 1.49x, with 15.8%, 20.9%, and 31.9% GPU memory usage, proving our model's effectiveness.
Similar Papers
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Machine Learning (CS)
Makes computers understand long texts much faster.
Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention
Machine Learning (CS)
Makes AI understand long stories faster.
GatedFWA: Linear Flash Windowed Attention with Gated Associative Memory
Machine Learning (CS)
Makes AI models learn faster and remember more.