Unsupervised Multi-channel Speech Dereverberation via Diffusion
By: Yulun Wu , Zhongweiyang Xu , Jianchong Chen and more
Potential Business Impact:
Clears echoes from voices in recordings.
We consider the problem of multi-channel single-speaker blind dereverberation, where multi-channel mixtures are used to recover the clean anechoic speech. To solve this problem, we propose USD-DPS, {U}nsupervised {S}peech {D}ereverberation via {D}iffusion {P}osterior {S}ampling. USD-DPS uses an unconditional clean speech diffusion model as a strong prior to solve the problem by posterior sampling. At each diffusion sampling step, we estimate all microphone channels' room impulse responses (RIRs), which are further used to enforce a multi-channel mixture consistency constraint for diffusion guidance. For multi-channel RIR estimation, we estimate reference-channel RIR by optimizing RIR parameters of a sub-band RIR signal model, with the Adam optimizer. We estimate non-reference channels' RIRs analytically using forward convolutive prediction (FCP). We found that this combination provides a good balance between sampling efficiency and RIR prior modeling, which shows superior performance among unsupervised dereverberation approaches. An audio demo page is provided in https://usddps.github.io/USDDPS_demo/.
Similar Papers
ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior
Audio and Speech Processing
Separates voices from noisy recordings.
Unsupervised Single-Channel Audio Separation with Diffusion Source Priors
Audio and Speech Processing
Separates music into individual instruments without needing original recordings.
Coupled Data and Measurement Space Dynamics for Enhanced Diffusion Posterior Sampling
Machine Learning (CS)
Makes blurry pictures clear from bad data.