Score: 0

Synthetic Prefixes to Mitigate Bias in Real-Time Neural Query Autocomplete

Published: October 2, 2025 | arXiv ID: 2510.01574v1

By: Adithya Rajan , Xiaoyu Liu , Prateek Verma and more

Potential Business Impact:

Finds better search results faster.

Business Areas:

Semantic Search Internet Services

We introduce a data-centric approach for mitigating presentation bias in real-time neural query autocomplete systems through the use of synthetic prefixes. These prefixes are generated from complete user queries collected during regular search sessions where autocomplete was not active. This allows us to enrich the training data for learning to rank models with more diverse and less biased examples. This method addresses the inherent bias in engagement signals collected from live query autocomplete interactions, where model suggestions influence user behavior. Our neural ranker is optimized for real-time deployment under strict latency constraints and incorporates a rich set of features, including query popularity, seasonality, fuzzy match scores, and contextual signals such as department affinity, device type, and vertical alignment with previous user queries. To support efficient training, we introduce a task-specific simplification of the listwise loss, reducing computational complexity from $O(n^2)$ to $O(n)$ by leveraging the query autocomplete structure of having only one ground-truth selection per prefix. Deployed in a large-scale e-commerce setting, our system demonstrates statistically significant improvements in user engagement, as measured by mean reciprocal rank and related metrics. Our findings show that synthetic prefixes not only improve generalization but also provide a scalable path toward bias mitigation in other low-latency ranking tasks, including related searches and query recommendations.

Semantic De-boosting in e-commerce Query Autocomplete

Information Theory

Shows shoppers better, different product ideas.

13 May 2025 0

86%

Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function

Computation and Language

Helps computers understand rare words better.

11 Sep 2025 0

86%

Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation

Computation and Language

Helps computers understand rare words they haven't heard.

25 Aug 2025 0

View PDF Login to Bookmark

Page Count

7 pages

Synthetic Prefixes to Mitigate Bias in Real-Time Neural Query Autocomplete

Finds better search results faster.

Technical Abstract

Semantic De-boosting in e-commerce Query Autocomplete

Improving Synthetic Data Training for Contextual Biasing Models with a Keyword-Aware Cost Function

Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation