Score: 0

Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning

Published: July 23, 2025 | arXiv ID: 2507.17842v1

By: Yimeng Zhang , Tian Wang , Jiri Gesi and more

Potential Business Impact:

Teaches computers to shop like people.

Business Areas:

Personalization Commerce and Shopping

Large Language Models (LLMs) have recently demonstrated strong potential in generating 'believable human-like' behavior in web environments. Prior work has explored augmenting training data with LLM-synthesized rationales and applying supervised fine-tuning (SFT) to enhance reasoning ability, which in turn can improve downstream action prediction. However, the performance of such approaches remains inherently bounded by the reasoning capabilities of the model used to generate the rationales. In this paper, we introduce Shop-R1, a novel reinforcement learning (RL) framework aimed at enhancing the reasoning ability of LLMs for simulation of real human behavior in online shopping environments Specifically, Shop-R1 decomposes the human behavior simulation task into two stages: rationale generation and action prediction, each guided by distinct reward signals. For rationale generation, we leverage internal model signals (e.g., logit distributions) to guide the reasoning process in a self-supervised manner. For action prediction, we propose a hierarchical reward structure with difficulty-aware scaling to prevent reward hacking and enable fine-grained reward assignment. This design evaluates both high-level action types and the correctness of fine-grained sub-action details (attributes and values), rewarding outputs proportionally to their difficulty. Experimental results show that our method achieves a relative improvement of over 65% compared to the baseline.

Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping

Computation and Language

Helps online stores act like you.

8 Oct 2025 0

91%

See, Think, Act: Online Shopper Behavior Simulation with VLM Agents

Computers and Society

Helps online shoppers make better choices by seeing.

22 Oct 2025 0

91%

Reinforcement Learning Meets Large Language Models: A Survey of Advancements and Applications Across the LLM Lifecycle

Computation and Language

Teaches computers to think and follow instructions better.

20 Sep 2025 1

View PDF Login to Bookmark

Page Count

14 pages

Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning

Teaches computers to shop like people.

Technical Abstract

Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping

See, Think, Act: Online Shopper Behavior Simulation with VLM Agents

Reinforcement Learning Meets Large Language Models: A Survey of Advancements and Applications Across the LLM Lifecycle