SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset
By: Yicheng Gu , Chaoren Wang , Junan Zhang and more
Potential Business Impact:
Creates more realistic singing voices for computers.
The lack of a publicly-available large-scale and diverse dataset has long been a significant bottleneck for singing voice applications like Singing Voice Synthesis (SVS) and Singing Voice Conversion (SVC). To tackle this problem, we present SingNet, an extensive, diverse, and in-the-wild singing voice dataset. Specifically, we propose a data processing pipeline to extract ready-to-use training data from sample packs and songs on the internet, forming 3000 hours of singing voices in various languages and styles. Furthermore, to facilitate the use and demonstrate the effectiveness of SingNet, we pre-train and open-source various state-of-the-art (SOTA) models on Wav2vec2, BigVGAN, and NSF-HiFiGAN based on our collected singing voice data. We also conduct benchmark experiments on Automatic Lyric Transcription (ALT), Neural Vocoder, and Singing Voice Conversion (SVC). Audio demos are available at: https://singnet-dataset.github.io/.
Similar Papers
SingVERSE: A Diverse, Real-World Benchmark for Singing Voice Enhancement
Sound
Makes bad singing sound good for everyone.
DiTSinger: Scaling Singing Voice Synthesis with Diffusion Transformer and Implicit Alignment
Sound
Makes AI sing songs with real-sounding voices.
YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance
Sound
Makes computers sing any song with any words.