What Do Prosody and Text Convey? Characterizing How Meaningful Information is Distributed Across Multiple Channels
By: Aditya Yadavalli , Tiago Pimentel , Tamar I Regev and more
Potential Business Impact:
Finds what emotions speech melody tells us.
Prosody -- the melody of speech -- conveys critical information often not captured by the words or text of a message. In this paper, we propose an information-theoretic approach to quantify how much information is expressed by prosody alone and not by text, and crucially, what that information is about. Our approach applies large speech and language models to estimate the mutual information between a particular dimension of an utterance's meaning (e.g., its emotion) and any of its communication channels (e.g., audio or text). We then use this approach to quantify how much information is conveyed by audio and text about sarcasm, emotion, and questionhood, using speech from television and podcasts. We find that for sarcasm and emotion the audio channel -- and by implication the prosodic channel -- transmits over an order of magnitude more information about these features than the text channel alone, at least when long-term context beyond the current sentence is unavailable. For questionhood, prosody provides comparatively less additional information. We conclude by outlining a program applying our approach to more dimensions of meaning, communication channels, and languages.
Similar Papers
Using Information Theory to Characterize Prosodic Typology: The Case of Tone, Pitch-Accent and Stress-Accent
Computation and Language
Languages use sound pitch to tell words apart.
The Prosody of Emojis
Computation and Language
Helps computers talk with feeling using emoji cues.
The time scale of redundancy between prosody and linguistic context
Computation and Language
Makes talking easier by predicting words and feelings.