Score: 0

StressTransfer: Stress-Aware Speech-to-Speech Translation with Emphasis Preservation

Published: October 15, 2025 | arXiv ID: 2510.13194v1

By: Xi Chen, Yuchen Song, Satoshi Nakamura

Potential Business Impact:

Translates voices while keeping their emotion.

Business Areas:
Translation Service Professional Services

We propose a stress-aware speech-to-speech translation (S2ST) system that preserves word-level emphasis by leveraging LLMs for cross-lingual emphasis conversion. Our method translates source-language stress into target-language tags that guide a controllable TTS model. To overcome data scarcity, we developed a pipeline to automatically generate aligned training data and introduce the "LLM-as-Judge" for evaluation. Experiments show our approach substantially outperforms baselines in preserving emphasis while maintaining comparable translation quality, speaker intent, and naturalness. Our work highlights the importance of prosody in translation and provides an effective, data-efficient solution for preserving paralinguistic cues in S2ST.

Page Count
5 pages

Category
Computer Science:
Computation and Language