Score: 1

PseudoVC: Improving One-shot Voice Conversion with Pseudo Paired Data

Published: June 1, 2025 | arXiv ID: 2506.01039v1

By: Songjun Cao , Qinghua Wu , Jie Chen and more

Potential Business Impact:

Changes one person's voice to sound like another.

Business Areas:

Speech Recognition Data and Analytics, Software

As parallel training data is scarce for one-shot voice conversion (VC) tasks, waveform reconstruction is typically performed by various VC systems. A typical one-shot VC system comprises a content encoder and a speaker encoder. However, two types of mismatches arise: one for the inputs to the content encoder during training and inference, and another for the inputs to the speaker encoder. To address these mismatches, we propose a novel VC training method called \textit{PseudoVC} in this paper. First, we introduce an innovative information perturbation approach named \textit{Pseudo Conversion} to tackle the first mismatch problem. This approach leverages pretrained VC models to convert the source utterance into a perturbed utterance, which is fed into the content encoder during training. Second, we propose an approach termed \textit{Speaker Sampling} to resolve the second mismatch problem, which will substitute the input to the speaker encoder by another utterance from the same speaker during training. Experimental results demonstrate that our proposed \textit{Pseudo Conversion} outperforms previous information perturbation methods, and the overall \textit{PseudoVC} method surpasses publicly available VC models. Audio examples are available.

O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion

Sound

Changes your voice to sound like anyone.

10 Oct 2025 1

88%

Pureformer-VC: Non-parallel Voice Conversion with Pure Stylized Transformer Blocks and Triplet Discriminative Training

Sound

Changes voices to sound like anyone else.

10 Jun 2025 1

88%

Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching

Audio and Speech Processing

Changes your voice to sound like anyone else.

1 Jun 2025 2

View PDF Login to Bookmark

Page Count

5 pages

PseudoVC: Improving One-shot Voice Conversion with Pseudo Paired Data

Changes one person's voice to sound like another.

Technical Abstract

O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion

Pureformer-VC: Non-parallel Voice Conversion with Pure Stylized Transformer Blocks and Triplet Discriminative Training

Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching