From Clicks to Preference: A Multi-stage Alignment Framework for Generative Query Suggestion in Conversational System
By: Junhao Yin , Haolin Wang , Peng Bao and more
Potential Business Impact:
Makes chatbots understand what you really want.
Generative query suggestion using large language models offers a powerful way to enhance conversational systems, but aligning outputs with nuanced user preferences remains a critical challenge. To address this, we introduce a multi-stage framework designed for progressive alignment between the generation policy and user intent. Our pipeline begins with prompt engineering as a cold-start strategy, followed by the Supervised Fine-Tuning stage, in which we introduce a distillation method on click logs to create a robust foundational model. To better model user preferences while capturing their inherent uncertainty, we develop a Gaussian Reward Model (GaRM) that represents user preferences as probability distributions rather than point estimates. Finally, we employ reinforcement learning to align the generation policy with these preferences, guided by a composite reward function that integrates GaRM with auxiliary heuristics to mitigate reward hacking. To maintain training stability, this process is enhanced by a novel out-of-distribution regularization method and a two-stage reward fusion technique. Extensive experiments demonstrate that our framework significantly outperforms baselines on both automatic and human evaluations and yields a 34\% relative increase in user engagement as measured by click-through rate in live A/B tests.
Similar Papers
From Prompting to Alignment: A Generative Framework for Query Recommendation
Information Retrieval
Helps search engines guess better what you want.
Generative Early Stage Ranking
Machine Learning (CS)
Helps online suggestions find what you like faster.
AI Guided Accelerator For Search Experience
Information Retrieval
Helps online shoppers find things faster.