Score: 0

Acoustic to Articulatory Inversion of Speech; Data Driven Approaches, Challenges, Applications, and Future Scope

Published: April 17, 2025 | arXiv ID: 2504.13308v1

By: Leena G Pillai, D. Muhammad Noorul Mubarak

Potential Business Impact:

Helps people learn to speak better by showing tongue movements.

Business Areas:

Speech Recognition Data and Analytics, Software

This review is focused on the data-driven approaches applied in different applications of Acoustic-to-Articulatory Inversion (AAI) of speech. This review paper considered the relevant works published in the last ten years (2011-2021). The selection criteria includes (a) type of AAI - Speaker Dependent and Speaker Independent AAI, (b) objectives of the work - Articulatory approximation, Articulatory Feature space selection and Automatic Speech Recognition (ASR), explore the correlation between acoustic and articulatory features, and framework for Computer-assisted language training, (c) Corpus - Simultaneously recorded speech (wav) and medical imaging models such as ElectroMagnetic Articulography (EMA), Electropalatography (EPG), Laryngography, Electroglottography (EGG), X-ray Cineradiography, Ultrasound, and real-time Magnetic Resonance Imaging (rtMRI), (d) Methods or models - recent works are considered, and therefore all the works are based on machine learning, (e) Evaluation - as AAI is a non-linear regression problem, the performance evaluation is mostly done by Correlation Coefficient (CC), Root Mean Square Error (RMSE), and also considered Mean Square Error (MSE), and Mean Format Error (MFE). The practical application of the AAI model can provide a better and user-friendly interpretable image feedback system of articulatory positions, especially tongue movement. Such trajectory feedback system can be used to provide phonetic, language, and speech therapy for pathological subjects.

Articulation-Informed ASR: Integrating Articulatory Features into ASR via Auxiliary Speech Inversion and Cross-Attention Fusion

Audio and Speech Processing

Helps computers understand speech better by "seeing" mouth movements.

1 Oct 2025 0

86%

Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio

Computation and Language

Helps computers understand many people talking at once.

16 May 2025 0

86%

Automatic Speech Recognition in the Modern Era: Architectures, Training, and Evaluation

Audio and Speech Processing

Makes computers understand spoken words better.

11 Oct 2025 0

View PDF Login to Bookmark

Page Count

8 pages

Acoustic to Articulatory Inversion of Speech; Data Driven Approaches, Challenges, Applications, and Future Scope

Helps people learn to speak better by showing tongue movements.

Technical Abstract

Articulation-Informed ASR: Integrating Articulatory Features into ASR via Auxiliary Speech Inversion and Cross-Attention Fusion

Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio

Automatic Speech Recognition in the Modern Era: Architectures, Training, and Evaluation