The Personalization Paradox: Semantic Loss vs. Reasoning Gains in Agentic AI Q&A
By: Satyajit Movidi, Stephen Russell
Potential Business Impact:
Makes AI tutors give better, personalized advice.
AIVisor, an agentic retrieval-augmented LLM for student advising, was used to examine how personalization affects system performance across multiple evaluation dimensions. Using twelve authentic advising questions intentionally designed to stress lexical precision, we compared ten personalized and non-personalized system configurations and analyzed outcomes with a Linear Mixed-Effects Model across lexical (BLEU, ROUGE-L), semantic (METEOR, BERTScore), and grounding (RAGAS) metrics. Results showed a consistent trade-off: personalization reliably improved reasoning quality and grounding, yet introduced a significant negative interaction on semantic similarity, driven not by poorer answers but by the limits of current metrics, which penalize meaningful personalized deviations from generic reference texts. This reveals a structural flaw in prevailing LLM evaluation methods, which are ill-suited for assessing user-specific responses. The fully integrated personalized configuration produced the highest overall gains, suggesting that personalization can enhance system effectiveness when evaluated with appropriate multidimensional metrics. Overall, the study demonstrates that personalization produces metric-dependent shifts rather than uniform improvements and provides a methodological foundation for more transparent and robust personalization in agentic AI.
Similar Papers
Enabling Personalized Long-term Interactions in LLM-based Agents through Persistent Memory and User Profiles
Artificial Intelligence
AI remembers you for better conversations.
A Survey of Personalization: From RAG to Agent
Information Retrieval
AI learns what you like to help you better.
Reasoning LLMs for User-Aware Multimodal Conversational Agents
Human-Computer Interaction
Robot learns about you instantly for better chats.