Score: 0

A multimodal Bayesian Network for symptom-level depression and anxiety prediction from voice and speech data

Published: December 8, 2025 | arXiv ID: 2512.07741v1

By: Agnes Norbury , George Fairs , Alexandra L. Georgescu and more

Potential Business Impact:

Helps doctors hear mental health problems in voices.

Business Areas:

Speech Recognition Data and Analytics, Software

During psychiatric assessment, clinicians observe not only what patients report, but important nonverbal signs such as tone, speech rate, fluency, responsiveness, and body language. Weighing and integrating these different information sources is a challenging task and a good candidate for support by intelligence-driven tools - however this is yet to be realized in the clinic. Here, we argue that several important barriers to adoption can be addressed using Bayesian network modelling. To demonstrate this, we evaluate a model for depression and anxiety symptom prediction from voice and speech features in large-scale datasets (30,135 unique speakers). Alongside performance for conditions and symptoms (for depression, anxiety ROC-AUC=0.842,0.831 ECE=0.018,0.015; core individual symptom ROC-AUC>0.74), we assess demographic fairness and investigate integration across and redundancy between different input modality types. Clinical usefulness metrics and acceptability to mental health service users are explored. When provided with sufficiently rich and large-scale multimodal data streams and specified to represent common mental conditions at the symptom rather than disorder level, such models are a principled approach for building robust assessment support tools: providing clinically-relevant outputs in a transparent and explainable format that is directly amenable to expert clinical supervision.

Exploring Machine Learning and Language Models for Multimodal Depression Detection

Computation and Language

Finds sadness in voices, faces, and words.

28 Aug 2025 1

89%

It Hears, It Sees too: Multi-Modal LLM for Depression Detection By Integrating Visual Understanding into Audio Language Models

Multimedia

Helps computers detect sadness from voices and faces.

25 Nov 2025 0

89%

Speech-Based Depressive Mood Detection in the Presence of Multiple Sclerosis: A Cross-Corpus and Cross-Lingual Study

Computation and Language

Helps find depression in people with MS.

25 Aug 2025 0

View PDF Login to Bookmark

Page Count

28 pages

A multimodal Bayesian Network for symptom-level depression and anxiety prediction from voice and speech data

Helps doctors hear mental health problems in voices.

Technical Abstract

Exploring Machine Learning and Language Models for Multimodal Depression Detection

It Hears, It Sees too: Multi-Modal LLM for Depression Detection By Integrating Visual Understanding into Audio Language Models

Speech-Based Depressive Mood Detection in the Presence of Multiple Sclerosis: A Cross-Corpus and Cross-Lingual Study