Score: 0

Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping

Published: October 8, 2025 | arXiv ID: 2510.07230v1

By: Ziyi Wang , Yuxuan Lu , Yimeng Zhang and more

Potential Business Impact:

Helps online stores act like you.

Business Areas:

Personalization Commerce and Shopping

Simulating step-wise human behavior with Large Language Models (LLMs) has become an emerging research direction, enabling applications in various practical domains. While prior methods, including prompting, supervised fine-tuning (SFT), and reinforcement learning (RL), have shown promise in modeling step-wise behavior, they primarily learn a population-level policy without conditioning on a user's persona, yielding generic rather than personalized simulations. In this work, we pose a critical question: how can LLM agents better simulate personalized user behavior? We introduce Customer-R1, an RL-based method for personalized, step-wise user behavior simulation in online shopping environments. Our policy is conditioned on an explicit persona, and we optimize next-step rationale and action generation via action correctness reward signals. Experiments on the OPeRA dataset emonstrate that Customer-R1 not only significantly outperforms prompting and SFT-based baselines in next-action prediction tasks, but also better matches users' action distribution, indicating higher fidelity in personalized behavior simulation.

Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning

Computation and Language

Keeps AI characters acting like themselves.

31 Oct 2025 2

90%

Personas within Parameters: Fine-Tuning Small Language Models with Low-Rank Adapters to Mimic User Behaviors

Information Retrieval

Helps apps learn what you like faster.

18 Aug 2025 1

90%

Can we use LLMs to bootstrap reinforcement learning? -- A case study in digital health behavior change

Machine Learning (CS)

Helps apps learn how to help people change habits.

19 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

12 pages

Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping

Helps online stores act like you.

Technical Abstract

Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning

Personas within Parameters: Fine-Tuning Small Language Models with Low-Rank Adapters to Mimic User Behaviors

Can we use LLMs to bootstrap reinforcement learning? -- A case study in digital health behavior change