Sparse Multi-Modal Transformer with Masking for Alzheimer's Disease Classification
By: Cheng-Han Lu, Pei-Hsuan Tsai
Transformer-based multi-modal intelligent systems often suffer from high computational and energy costs due to dense self-attention, limiting their scalability under resource constraints. This paper presents SMMT, a sparse multi-modal transformer architecture designed to improve efficiency and robustness. Building upon a cascaded multi-modal transformer framework, SMMT introduces cluster-based sparse attention to achieve near linear computational complexity and modality-wise masking to enhance robustness against incomplete inputs. The architecture is evaluated using Alzheimer's Disease classification on the ADNI dataset as a representative multi-modal case study. Experimental results show that SMMT maintains competitive predictive performance while significantly reducing training time, memory usage, and energy consumption compared to dense attention baselines, demonstrating its suitability as a resource-aware architectural component for scalable intelligent systems.
Similar Papers
LLMCARE: Alzheimer's Detection via Transformer Models Enhanced by LLM-Generated Synthetic Data
Computation and Language
Finds early signs of memory loss in voices.
Multi-Modal Brain Tumor Segmentation via 3D Multi-Scale Self-attention and Cross-attention
Image and Video Processing
Helps doctors find brain tumors more accurately.
Pretraining Transformer-Based Models on Diffusion-Generated Synthetic Graphs for Alzheimer's Disease Prediction
Machine Learning (CS)
Helps find Alzheimer's disease earlier with AI.