Score: 1

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning

Published: September 24, 2025 | arXiv ID: 2509.19736v1

By: Cheng Qian , Zuxin Liu , Akshara Prabhakar and more

Potential Business Impact:

Teaches AI to have better conversations with people.

Business Areas:

Virtual Reality Hardware, Software

Reinforcement learning (RL) has shown promise in training agentic models that move beyond static benchmarks to engage in dynamic, multi-turn interactions. Yet, the ultimate value of such agents lies in their ability to assist users, a setting where diversity and dynamics of user interaction pose challenges. In this work, we propose UserRL, a unified framework for training and evaluating user-centric abilities through standardized gym environments paired with simulated users. We systematically vary turn-level reward assignment and trajectory-level score calculation to analyze how different formulations affect learning under the GRPO algorithm. Our experiments across Qwen3 models reveal three key findings: (i) SFT cold start is critical for unlocking initial interaction ability and enabling sustained RL improvements; (ii) deliberate trajectory scoring yields more efficient and effective multi-turn interactions; and (iii) while stronger simulated users (e.g., GPT-4o) facilitates training, open-source simulators (e.g., Qwen3-32B) remain a cost-effective and transferable option. Together, these results highlight that careful design of reward shaping and user simulation choice is as crucial as model scale, and establish UserRL as a practical pathway for developing robust user-centric agentic models. All codes and data are public for future research.

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

Computation and Language

Teaches robots to use websites like people.

22 May 2025 2

89%

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Machine Learning (CS)

Teaches AI to solve hard problems by trying things.

10 Sep 2025 1

89%

MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use

Artificial Intelligence

Teaches AI to talk and use tools better.

26 Aug 2025 3

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

28 pages

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning

Teaches AI to have better conversations with people.

Technical Abstract

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use