Score: 1

Linear Script Representations in Speech Foundation Models Enable Zero-Shot Transliteration

Published: January 6, 2026 | arXiv ID: 2601.02906v1

By: Ryan Soh-Eun Shim , Kwanghee Choi , Kalvin Chang and more

Potential Business Impact:

Changes how computers write spoken words.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Multilingual speech foundation models such as Whisper are trained on web-scale data, where data for each language consists of a myriad of regional varieties. However, different regional varieties often employ different scripts to write the same language, rendering speech recognition output also subject to non-determinism in the output script. To mitigate this problem, we show that script is linearly encoded in the activation space of multilingual speech models, and that modifying activations at inference time enables direct control over output script. We find the addition of such script vectors to activations at test time can induce script change even in unconventional language-script pairings (e.g. Italian in Cyrillic and Japanese in Latin script). We apply this approach to inducing post-hoc control over the script of speech recognition output, where we observe competitive performance across all model sizes of Whisper.

Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically

Computation and Language

Helps computers understand many languages by listening.

26 May 2025 1

87%

The Role of Orthographic Consistency in Multilingual Embedding Models for Text Classification in Arabic-Script Languages

Computation and Language

Helps computers understand Arabic-script languages better.

24 Jul 2025 0

86%

Happiness is Sharing a Vocabulary: A Study of Transliteration Methods

Computation and Language

Helps computers understand different languages better.

12 Oct 2025 3

View PDF Login to Bookmark

Country of Origin

🇩🇪 Germany

Repos / Data Links

github.com github.com

Page Count

12 pages

Linear Script Representations in Speech Foundation Models Enable Zero-Shot Transliteration

Changes how computers write spoken words.

Technical Abstract

Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically

The Role of Orthographic Consistency in Multilingual Embedding Models for Text Classification in Arabic-Script Languages

Happiness is Sharing a Vocabulary: A Study of Transliteration Methods