Score: 0

Foundation Model-based Evaluation of Neuropsychiatric Disorders: A Lifespan-Inclusive, Multi-Modal, and Multi-Lingual Study

Published: December 24, 2025 | arXiv ID: 2512.20948v1

By: Zhongren Dong , Haotian Guo , Weixiang Xu and more

Potential Business Impact:

Finds brain diseases early using voice and words.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Neuropsychiatric disorders, such as Alzheimer's disease (AD), depression, and autism spectrum disorder (ASD), are characterized by linguistic and acoustic abnormalities, offering potential biomarkers for early detection. Despite the promise of multi-modal approaches, challenges like multi-lingual generalization and the absence of a unified evaluation framework persist. To address these gaps, we propose FEND (Foundation model-based Evaluation of Neuropsychiatric Disorders), a comprehensive multi-modal framework integrating speech and text modalities for detecting AD, depression, and ASD across the lifespan. Leveraging 13 multi-lingual datasets spanning English, Chinese, Greek, French, and Dutch, we systematically evaluate multi-modal fusion performance. Our results show that multi-modal fusion excels in AD and depression detection but underperforms in ASD due to dataset heterogeneity. We also identify modality imbalance as a prevalent issue, where multi-modal fusion fails to surpass the best mono-modal models. Cross-corpus experiments reveal robust performance in task- and language-consistent scenarios but noticeable degradation in multi-lingual and task-heterogeneous settings. By providing extensive benchmarks and a detailed analysis of performance-influencing factors, FEND advances the field of automated, lifespan-inclusive, and multi-lingual neuropsychiatric disorder assessment. We encourage researchers to adopt the FEND framework for fair comparisons and reproducible research.

It Hears, It Sees too: Multi-Modal LLM for Depression Detection By Integrating Visual Understanding into Audio Language Models

Multimedia

Helps computers spot depression from talking and faces.

25 Nov 2025 0

88%

It Hears, It Sees too: Multi-Modal LLM for Depression Detection By Integrating Visual Understanding into Audio Language Models

Multimedia

Helps computers detect sadness from voices and faces.

25 Nov 2025 0

88%

Multi-omic Prognosis of Alzheimer's Disease with Asymmetric Cross-Modal Cross-Attention Network

Image and Video Processing

Finds Alzheimer's early using brain scans and data.

9 Jul 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

14 pages

Foundation Model-based Evaluation of Neuropsychiatric Disorders: A Lifespan-Inclusive, Multi-Modal, and Multi-Lingual Study

Finds brain diseases early using voice and words.

Technical Abstract

It Hears, It Sees too: Multi-Modal LLM for Depression Detection By Integrating Visual Understanding into Audio Language Models

It Hears, It Sees too: Multi-Modal LLM for Depression Detection By Integrating Visual Understanding into Audio Language Models

Multi-omic Prognosis of Alzheimer's Disease with Asymmetric Cross-Modal Cross-Attention Network