Score: 3

Exploring Recommender System Evaluation: A Multi-Modal User Agent Framework for A/B Testing

Published: January 8, 2026 | arXiv ID: 2601.04554v1

By: Wenlin Zhang , Xiangyang Li , Qiyuan Ge and more

BigTech Affiliations: Huawei

Potential Business Impact:

Tests website changes without real people.

Business Areas:

A/B Testing Data and Analytics

In recommender systems, online A/B testing is a crucial method for evaluating the performance of different models. However, conducting online A/B testing often presents significant challenges, including substantial economic costs, user experience degradation, and considerable time requirements. With the Large Language Models' powerful capacity, LLM-based agent shows great potential to replace traditional online A/B testing. Nonetheless, current agents fail to simulate the perception process and interaction patterns, due to the lack of real environments and visual perception capability. To address these challenges, we introduce a multi-modal user agent for A/B testing (A/B Agent). Specifically, we construct a recommendation sandbox environment for A/B testing, enabling multimodal and multi-page interactions that align with real user behavior on online platforms. The designed agent leverages multimodal information perception, fine-grained user preferences, and integrates profiles, action memory retrieval, and a fatigue system to simulate complex human decision-making. We validated the potential of the agent as an alternative to traditional A/B testing from three perspectives: model, data, and features. Furthermore, we found that the data generated by A/B Agent can effectively enhance the capabilities of recommendation models. Our code is publicly available at https://github.com/Applied-Machine-Learning-Lab/ABAgent.

AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents

Human-Computer Interaction

Tests website changes with smart computer people.

13 Apr 2025 0

90%

A Reinforcement-Learning-Enhanced LLM Framework for Automated A/B Testing in Personalized Marketing

Information Retrieval

Makes ads show the best thing to each person.

27 May 2025 0

88%

UXAgent: A System for Simulating Usability Testing of Web Design with LLM Agents

Computation and Language

Tests website designs with fake users before launch.

13 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇭🇰 China, Hong Kong

Repos / Data Links

github.com

Page Count

12 pages

Exploring Recommender System Evaluation: A Multi-Modal User Agent Framework for A/B Testing

Tests website changes without real people.

Technical Abstract

AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents

A Reinforcement-Learning-Enhanced LLM Framework for Automated A/B Testing in Personalized Marketing

UXAgent: A System for Simulating Usability Testing of Web Design with LLM Agents