Time-Masked Transformers with Lightweight Test-Time Adaptation for Neural Speech Decoding
By: Ebrahim Feghhi , Shreyas Kaasyap , Nima Hadidi and more
Potential Business Impact:
Helps paralyzed people talk by reading brain signals.
Speech neuroprostheses aim to restore communication for people with severe paralysis by decoding speech directly from neural activity. To accelerate algorithmic progress, a recent benchmark released intracranial recordings from a paralyzed participant attempting to speak, along with a baseline decoding algorithm. Prior work on the benchmark showed impressive accuracy gains. However, these gains increased computational costs and were not demonstrated in a real-time decoding setting. Here, we make three contributions that pave the way towards accurate, efficient, and real-time neural speech decoding. First, we incorporate large amounts of time masking during training. On average, over $50\%$ of each trial is masked. Second, we replace the gated recurrent unit (GRU) architecture used in the baseline algorithm with a compact Transformer. The Transformer architecture uses $77\%$ fewer parameters, cuts peak GPU memory usage by $36\%$ relative, and is significantly faster to calibrate relative to the GRU. Third, we design a lightweight variant of an existing test-time adaptation method developed for decoding handwriting from neural activity. Our variant adapts the model using multiple time masked augmentations of a single trial and requires only one gradient step per trial. Together, these contributions reduce word error rate by $19.5\%$ and effectively mitigate performance degradations across held-out days in a real-time decoding setting while substantially lowering computational costs.
Similar Papers
NeuroIncept Decoder for High-Fidelity Speech Reconstruction from Neural Activity
Sound
Lets people talk by reading their brain waves.
Masked Generative Nested Transformers with Decode Time Scaling
CV and Pattern Recognition
Makes AI image and video creation much faster.
Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation
CV and Pattern Recognition
Makes AI create pictures and videos much faster.