What Triggers my Model? Contrastive Explanations Inform Gender Choices by Translation Models
By: Janiça Hackenbuchner, Arda Tezcan, Joke Daems
Potential Business Impact:
Finds why computer translations get gender wrong.
Interpretability can be implemented as a means to understand decisions taken by (black box) models, such as machine translation (MT) or large language models (LLMs). Yet, research in this area has been limited in relation to a manifested problem in these models: gender bias. With this research, we aim to move away from simply measuring bias to exploring its origins. Working with gender-ambiguous natural source data, this study examines which context, in the form of input tokens in the source sentence, influences (or triggers) the translation model choice of a certain gender inflection in the target language. To analyse this, we use contrastive explanations and compute saliency attribution. We first address the challenge of a lacking scoring threshold and specifically examine different attribution levels of source words on the model gender decisions in the translation. We compare salient source words with human perceptions of gender and demonstrate a noticeable overlap between human perceptions and model attribution. Additionally, we provide a linguistic analysis of salient words. Our work showcases the relevance of understanding model translation decisions in terms of gender, how this compares to human decisions and that this information should be leveraged to mitigate gender bias.
Similar Papers
Uncertainty Quantification for Evaluating Machine Translation Bias
Computation and Language
Makes computer translators less biased about gender.
Voice, Bias, and Coreference: An Interpretability Study of Gender in Speech Translation
Computation and Language
Translates speech, guessing gender from sound, not just pitch.
Addressing speaker gender bias in large scale speech translation systems
Computation and Language
Fixes translation mistakes for female speakers.