Score: 2

FCPE: A Fast Context-based Pitch Estimation Model

Published: September 18, 2025 | arXiv ID: 2509.15140v1

By: Yuxin Luo , Ruoyi Zhang , Lu-Chuan Liu and more

Potential Business Impact:

Helps computers hear singing clearly, even with noise.

Business Areas:
Facial Recognition Data and Analytics, Software

Pitch estimation (PE) in monophonic audio is crucial for MIDI transcription and singing voice conversion (SVC), but existing methods suffer significant performance degradation under noise. In this paper, we propose FCPE, a fast context-based pitch estimation model that employs a Lynx-Net architecture with depth-wise separable convolutions to effectively capture mel spectrogram features while maintaining low computational cost and robust noise tolerance. Experiments show that our method achieves 96.79\% Raw Pitch Accuracy (RPA) on the MIR-1K dataset, on par with the state-of-the-art methods. The Real-Time Factor (RTF) is 0.0062 on a single RTX 4090 GPU, which significantly outperforms existing algorithms in efficiency. Code is available at https://github.com/CNChTu/FCPE.

Repos / Data Links

Page Count
5 pages

Category
Computer Science:
Sound