Score: 0

Time-Masked Transformers with Lightweight Test-Time Adaptation for Neural Speech Decoding

Published: July 3, 2025 | arXiv ID: 2507.02800v1

By: Ebrahim Feghhi , Shreyas Kaasyap , Nima Hadidi and more

Potential Business Impact:

Helps paralyzed people talk by reading brain signals.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Speech neuroprostheses aim to restore communication for people with severe paralysis by decoding speech directly from neural activity. To accelerate algorithmic progress, a recent benchmark released intracranial recordings from a paralyzed participant attempting to speak, along with a baseline decoding algorithm. Prior work on the benchmark showed impressive accuracy gains. However, these gains increased computational costs and were not demonstrated in a real-time decoding setting. Here, we make three contributions that pave the way towards accurate, efficient, and real-time neural speech decoding. First, we incorporate large amounts of time masking during training. On average, over $50\%$ of each trial is masked. Second, we replace the gated recurrent unit (GRU) architecture used in the baseline algorithm with a compact Transformer. The Transformer architecture uses $77\%$ fewer parameters, cuts peak GPU memory usage by $36\%$ relative, and is significantly faster to calibrate relative to the GRU. Third, we design a lightweight variant of an existing test-time adaptation method developed for decoding handwriting from neural activity. Our variant adapts the model using multiple time masked augmentations of a single trial and requires only one gradient step per trial. Together, these contributions reduce word error rate by $19.5\%$ and effectively mitigate performance degradations across held-out days in a real-time decoding setting while substantially lowering computational costs.

NeuroIncept Decoder for High-Fidelity Speech Reconstruction from Neural Activity

Sound

Lets people talk by reading their brain waves.

7 Jan 2025 1

88%

Masked Generative Nested Transformers with Decode Time Scaling

CV and Pattern Recognition

Makes AI image and video creation much faster.

1 Feb 2025 1

87%

Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation

CV and Pattern Recognition

Makes AI create pictures and videos much faster.

6 Mar 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

19 pages

Time-Masked Transformers with Lightweight Test-Time Adaptation for Neural Speech Decoding

Helps paralyzed people talk by reading brain signals.

Technical Abstract

NeuroIncept Decoder for High-Fidelity Speech Reconstruction from Neural Activity

Masked Generative Nested Transformers with Decode Time Scaling

Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation