Score: 0

ADVOSYNTH: A Synthetic Multi-Advocate Dataset for Speaker Identification in Courtroom Scenarios

Published: January 15, 2026 | arXiv ID: 2601.10315v1

By: Aniket Deroy

As large-scale speech-to-speech models achieve high fidelity, the distinction between synthetic voices in structured environments becomes a vital area of study. This paper introduces Advosynth-500, a specialized dataset comprising 100 synthetic speech files featuring 10 unique advocate identities. Using the Speech Llama Omni model, we simulate five distinct advocate pairs engaged in courtroom arguments. We define specific vocal characteristics for each advocate and present a speaker identification challenge to evaluate the ability of modern systems to map audio files to their respective synthetic origins. Dataset is available at this link-https: //github.com/naturenurtureelite/ADVOSYNTH-500.

Category
Computer Science:
Computation and Language