Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping
By: Ziyi Wang , Yuxuan Lu , Yimeng Zhang and more
Potential Business Impact:
Helps online stores act like you.
Simulating step-wise human behavior with Large Language Models (LLMs) has become an emerging research direction, enabling applications in various practical domains. While prior methods, including prompting, supervised fine-tuning (SFT), and reinforcement learning (RL), have shown promise in modeling step-wise behavior, they primarily learn a population-level policy without conditioning on a user's persona, yielding generic rather than personalized simulations. In this work, we pose a critical question: how can LLM agents better simulate personalized user behavior? We introduce Customer-R1, an RL-based method for personalized, step-wise user behavior simulation in online shopping environments. Our policy is conditioned on an explicit persona, and we optimize next-step rationale and action generation via action correctness reward signals. Experiments on the OPeRA dataset emonstrate that Customer-R1 not only significantly outperforms prompting and SFT-based baselines in next-action prediction tasks, but also better matches users' action distribution, indicating higher fidelity in personalized behavior simulation.
Similar Papers
Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
Computation and Language
Keeps AI characters acting like themselves.
Personas within Parameters: Fine-Tuning Small Language Models with Low-Rank Adapters to Mimic User Behaviors
Information Retrieval
Helps apps learn what you like faster.
Can we use LLMs to bootstrap reinforcement learning? -- A case study in digital health behavior change
Machine Learning (CS)
Helps apps learn how to help people change habits.