A Novel Semantic Compression Approach for Ultra-low Bandwidth Voice Communication
By: Ryan Collette, Ross Greenwood, Serena Nicoll
Potential Business Impact:
Makes voices sound clear with less data.
While existing speech audio codecs designed for compression exploit limited forms of temporal redundancy and allow for multi-scale representations, they tend to represent all features of audio in the same way. In contrast, generative voice models designed for text-to-speech and voice transfer tasks have recently proved effective at factorizing audio signals into high-level semantic representations of fundamentally distinct features. In this paper, we leverage such representations in a novel semantic communications approach to achieve lower bitrates without sacrificing perceptual quality or suitability for specific downstream tasks. Our technique matches or outperforms existing audio codecs on transcription, sentiment analysis, and speaker verification when encoding at 2-4x lower bitrate -- notably surpassing Encodec in perceptual quality and speaker verification while using up to 4x less bitrate.
Similar Papers
Generative Semantic Coding for Ultra-Low Bitrate Visual Communication and Analysis
CV and Pattern Recognition
Sends clear pictures using tiny amounts of data.
STCTS: Generative Semantic Compression for Ultra-Low Bitrate Speech via Explicit Text-Prosody-Timbre Decomposition
Sound
Lets people talk clearly on slow internet.
STCTS: Generative Semantic Compression for Ultra-Low Bitrate Speech via Explicit Text-Prosody-Timbre Decomposition
Sound
Makes talking possible with very little internet.