A stylometric analysis of speaker attribution from speech transcripts
By: Cristina Aggazzotti, Elizabeth Allyn Smith
Potential Business Impact:
Identifies speakers by what they say, not just how.
Forensic scientists often need to identify an unknown speaker or writer in cases such as ransom calls, covert recordings, alleged suicide notes, or anonymous online communications, among many others. Speaker recognition in the speech domain usually examines phonetic or acoustic properties of a voice, and these methods can be accurate and robust under certain conditions. However, if a speaker disguises their voice or employs text-to-speech software, vocal properties may no longer be reliable, leaving only their linguistic content available for analysis. Authorship attribution methods traditionally use syntactic, semantic, and related linguistic information to identify writers of written text (authorship attribution). In this paper, we apply a content-based authorship approach to speech that has been transcribed into text, using what a speaker says to attribute speech to individuals (speaker attribution). We introduce a stylometric method, StyloSpeaker, which incorporates character, word, token, sentence, and style features from the stylometric literature on authorship, to assess whether two transcripts were produced by the same speaker. We evaluate this method on two types of transcript formatting: one approximating prescriptive written text with capitalization and punctuation and another normalized style that removes these conventions. The transcripts' conversation topics are also controlled to varying degrees. We find generally higher attribution performance on normalized transcripts, except under the strongest topic control condition, in which overall performance is highest. Finally, we compare this more explainable stylometric model to black-box neural approaches on the same data and investigate which stylistic features most effectively distinguish speakers.
Similar Papers
Distinguishing AI-Generated and Human-Written Text Through Psycholinguistic Analysis
Computation and Language
Finds fake writing by looking at thinking patterns.
LLM one-shot style transfer for Authorship Attribution and Verification
Computation and Language
Finds who wrote text, even if it's AI.
It takes a village to write a book: Mapping anonymous contributions in Stephen Langton's Quaestiones Theologiae
Computation and Language
**Uncovers how old books were written together.**