Score: 1

A Semantic Information-based Hierarchical Speech Enhancement Method Using Factorized Codec and Diffusion Model

Published: May 20, 2025 | arXiv ID: 2505.13843v1

By: Yang Xiang , Canan Huang , Desheng Hu and more

Potential Business Impact:

Cleans up noisy speech for better understanding.

Business Areas:

Semantic Web Internet Services

Most current speech enhancement (SE) methods recover clean speech from noisy inputs by directly estimating time-frequency masks or spectrums. However, these approaches often neglect the distinct attributes, such as semantic content and acoustic details, inherent in speech signals, which can hinder performance in downstream tasks. Moreover, their effectiveness tends to degrade in complex acoustic environments. To overcome these challenges, we propose a novel, semantic information-based, step-by-step factorized SE method using factorized codec and diffusion model. Unlike traditional SE methods, our hierarchical modeling of semantic and acoustic attributes enables more robust clean speech recovery, particularly in challenging acoustic scenarios. Moreover, this method offers further advantages for downstream TTS tasks. Experimental results demonstrate that our algorithm not only outperforms SOTA baselines in terms of speech quality but also enhances TTS performance in noisy environments.

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling

Audio and Speech Processing

Makes noisy speech clear using word meanings.

5 Feb 2025 1

89%

High-Fidelity Speech Enhancement via Discrete Audio Tokens

Sound

Cleans up noisy speech for better hearing.

2 Oct 2025 2

89%

A Novel Semantic Compression Approach for Ultra-low Bandwidth Voice Communication

Sound

Makes voices sound clear with less data.

18 Sep 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

5 pages

A Semantic Information-based Hierarchical Speech Enhancement Method Using Factorized Codec and Diffusion Model

Cleans up noisy speech for better understanding.

Technical Abstract

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling

High-Fidelity Speech Enhancement via Discrete Audio Tokens

A Novel Semantic Compression Approach for Ultra-low Bandwidth Voice Communication