Score: 0

Pseudo-Cepstrum: Pitch Modification for Mel-Based Neural Vocoders

Published: December 18, 2025 | arXiv ID: 2512.16519v1

By: Nikolaos Ellinas , Alexandra Vioni , Panos Kakoulidis and more

This paper introduces a cepstrum-based pitch modification method that can be applied to any mel-spectrogram representation. As a result, this method is compatible with any mel-based vocoder without requiring any additional training or changes to the model. This is achieved by directly modifying the cepstrum feature space in order to shift the harmonic structure to the desired target. The spectrogram magnitude is computed via the pseudo-inverse mel transform, then converted to the cepstrum by applying DCT. In this domain, the cepstral peak is shifted without having to estimate its position and the modified mel is recomputed by applying IDCT and mel-filterbank. These pitch-shifted mel-spectrogram features can be converted to speech with any compatible vocoder. The proposed method is validated experimentally with objective and subjective metrics on various state-of-the-art neural vocoders as well as in comparison with traditional pitch modification methods.

Enhancing Spectrogram Realism in Singing Voice Synthesis via Explicit Bandwidth Extension Prior to Vocoder

Sound

Makes fake singing voices sound real.

3 Aug 2025 0

86%

Real-Time Pitch/F0 Detection Using Spectrogram Images and Convolutional Neural Networks

Sound

Helps computers hear singing pitch better.

8 Apr 2025 0

85%

Real-Time Streaming Mel Vocoding with Generative Flow Matching

Audio and Speech Processing

Makes computer voices sound more real, faster.

18 Sep 2025 0

View PDF Login to Bookmark

Pseudo-Cepstrum: Pitch Modification for Mel-Based Neural Vocoders

Technical Abstract

Enhancing Spectrogram Realism in Singing Voice Synthesis via Explicit Bandwidth Extension Prior to Vocoder

Real-Time Pitch/F0 Detection Using Spectrogram Images and Convolutional Neural Networks

Real-Time Streaming Mel Vocoding with Generative Flow Matching