Score: 1

SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation

Published: April 7, 2025 | arXiv ID: 2504.05106v1

By: Stephen Brade , Sam Anderson , Rithesh Kumar and more

BigTech Affiliations: Massachusetts Institute of Technology

Potential Business Impact:

Makes talking videos sound like you want.

Business Areas:

Speech Recognition Data and Analytics, Software

Novice content creators often invest significant time recording expressive speech for social media videos. While recent advancements in text-to-speech (TTS) technology can generate highly realistic speech in various languages and accents, many struggle with unintuitive or overly granular TTS interfaces. We propose simplifying TTS generation by allowing users to specify high-level context alongside their script. Our Wizard-of-Oz system, SpeakEasy, leverages user-provided context to inform and influence TTS output, enabling iterative refinement with high-level feedback. This approach was informed by two 8-subject formative studies: one examining content creators' experiences with TTS, and the other drawing on effective strategies from voice actors. Our evaluation shows that participants using SpeakEasy were more successful in generating performances matching their personal standards, without requiring significantly more effort than leading industry interfaces.

Your voice is your voice: Supporting Self-expression through Speech Generation and LLMs in Augmented and Alternative Communication

Human-Computer Interaction

Helps people speak with more feeling and detail.

21 Mar 2025 0

88%

StepWrite: Adaptive Planning for Speech-Driven Text Generation

Human-Computer Interaction

Lets you write long texts using only your voice.

6 Aug 2025 1

88%

Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis

Sound

Makes computers speak any language, even rare ones.

10 Apr 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

19 pages

SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation

Makes talking videos sound like you want.

Technical Abstract

Your voice is your voice: Supporting Self-expression through Speech Generation and LLMs in Augmented and Alternative Communication

StepWrite: Adaptive Planning for Speech-Driven Text Generation

Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis