Score: 1

VCTR: A Transformer-Based Model for Non-parallel Voice Conversion

Published: October 14, 2025 | arXiv ID: 2510.12964v1

By: Maharnab Saikia

Potential Business Impact:

Changes voices without needing matching recordings.

Business Areas:

Speech Recognition Data and Analytics, Software

Non-parallel voice conversion aims to convert voice from a source domain to a target domain without paired training data. Cycle-Consistent Generative Adversarial Networks (CycleGAN) and Variational Autoencoders (VAE) have been used for this task, but these models suffer from difficult training and unsatisfactory results. Later, Contrastive Voice Conversion (CVC) was introduced, utilizing a contrastive learning-based approach to address these issues. However, these methods use CNN-based generators, which can capture local semantics but lacks the ability to capture long-range dependencies necessary for global semantics. In this paper, we propose VCTR, an efficient method for non-parallel voice conversion that leverages the Hybrid Perception Block (HPB) and Dual Pruned Self-Attention (DPSA) along with a contrastive learning-based adversarial approach. The code can be found in https://github.com/Maharnab-Saikia/VCTR.

Pureformer-VC: Non-parallel Voice Conversion with Pure Stylized Transformer Blocks and Triplet Discriminative Training

Sound

Changes voices to sound like anyone else.

10 Jun 2025 1

87%

Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion

Sound

Makes computer voices sound more real.

18 Apr 2025 1

87%

Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements

Sound

Changes one voice to sound like another.

27 Apr 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

7 pages

VCTR: A Transformer-Based Model for Non-parallel Voice Conversion

Changes voices without needing matching recordings.

Technical Abstract

Pureformer-VC: Non-parallel Voice Conversion with Pure Stylized Transformer Blocks and Triplet Discriminative Training

Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion

Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements