Score: 2

FLASepformer: Efficient Speech Separation with Gated Focused Linear Attention Transformer

Published: August 27, 2025 | arXiv ID: 2508.19528v1

By: Haoxu Wang , Yiheng Jiang , Gang Qiao and more

BigTech Affiliations: Alibaba

Potential Business Impact:

Cleans up noisy audio faster and with less power.

Business Areas:

Field-Programmable Gate Array (FPGA) Hardware

Speech separation always faces the challenge of handling prolonged time sequences. Past methods try to reduce sequence lengths and use the Transformer to capture global information. However, due to the quadratic time complexity of the attention module, memory usage and inference time still increase significantly with longer segments. To tackle this, we introduce Focused Linear Attention and build FLASepformer with linear complexity for efficient speech separation. Inspired by SepReformer and TF-Locoformer, we have two variants: FLA-SepReformer and FLA-TFLocoformer. We also add a new Gated module to improve performance further. Experimental results on various datasets show that FLASepformer matches state-of-the-art performance with less memory consumption and faster inference. FLA-SepReformer-T/B/L increases speed by 2.29x, 1.91x, and 1.49x, with 15.8%, 20.9%, and 31.9% GPU memory usage, proving our model's effectiveness.