The TCG CREST -- RKMVERI Submission for the NCIIPC Startup India AI Grand Challenge
By: Nikhil Raghav , Arnab Banerjee , Janojit Chakraborty and more
In this report, we summarize the integrated multilingual audio processing pipeline developed by our team for the inaugural NCIIPC Startup India AI GRAND CHALLENGE, addressing Problem Statement 06: Language-Agnostic Speaker Identification and Diarisation, and subsequent Transcription and Translation System. Our primary focus was on advancing speaker diarization, a critical component for multilingual and code-mixed scenarios. The main intent of this work was to study the real-world applicability of our in-house speaker diarization (SD) systems. To this end, we investigated a robust voice activity detection (VAD) technique and fine-tuned speaker embedding models for improved speaker identification in low-resource settings. We leveraged our own recently proposed multi-kernel consensus spectral clustering framework, which substantially improved the diarization performance across all recordings in the training corpus provided by the organizers. Complementary modules for speaker and language identification, automatic speech recognition (ASR), and neural machine translation were integrated in the pipeline. Post-processing refinements further improved system robustness.
Similar Papers
Building Robust and Scalable Multilingual ASR for Indian Languages
Computation and Language
Helps computers understand different languages and accents.
TalTech Systems for the Interspeech 2025 ML-SUPERB 2.0 Challenge
Computation and Language
Lets computers understand many languages spoken
Transsion Multilingual Speech Recognition System for MLC-SLM 2025 Challenge
Audio and Speech Processing
Lets computers understand many languages spoken.