Hybrid Vision Transformer-Mamba Framework for Autism Diagnosis via Eye-Tracking Analysis
By: Wafaa Kasri , Yassine Himeur , Abigail Copiaco and more
Potential Business Impact:
Finds autism faster using eye movements.
Accurate Autism Spectrum Disorder (ASD) diagnosis is vital for early intervention. This study presents a hybrid deep learning framework combining Vision Transformers (ViT) and Vision Mamba to detect ASD using eye-tracking data. The model uses attention-based fusion to integrate visual, speech, and facial cues, capturing both spatial and temporal dynamics. Unlike traditional handcrafted methods, it applies state-of-the-art deep learning and explainable AI techniques to enhance diagnostic accuracy and transparency. Tested on the Saliency4ASD dataset, the proposed ViT-Mamba model outperformed existing methods, achieving 0.96 accuracy, 0.95 F1-score, 0.97 sensitivity, and 0.94 specificity. These findings show the model's promise for scalable, interpretable ASD screening, especially in resource-constrained or remote clinical settings where access to expert diagnosis is limited.
Similar Papers
Exploring Image Transforms derived from Eye Gaze Variables for Progressive Autism Diagnosis
Image and Video Processing
Helps doctors spot autism faster at home.
Mamba-CNN: A Hybrid Architecture for Efficient and Accurate Facial Beauty Prediction
CV and Pattern Recognition
Makes computers judge faces as pretty or not.
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
CV and Pattern Recognition
Lets computers see Earth better from space.