Score: 1

LLM-Driven Policy Diffusion: Enhancing Generalization in Offline Reinforcement Learning

Published: August 30, 2025 | arXiv ID: 2509.00347v1

By: Hanping Zhang, Yuhong Guo

Potential Business Impact:

Teaches robots new jobs from examples.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

Reinforcement Learning (RL) is known for its strong decision-making capabilities and has been widely applied in various real-world scenarios. However, with the increasing availability of offline datasets and the lack of well-designed online environments from human experts, the challenge of generalization in offline RL has become more prominent. Due to the limitations of offline data, RL agents trained solely on collected experiences often struggle to generalize to new tasks or environments. To address this challenge, we propose LLM-Driven Policy Diffusion (LLMDPD), a novel approach that enhances generalization in offline RL using task-specific prompts. Our method incorporates both text-based task descriptions and trajectory prompts to guide policy learning. We leverage a large language model (LLM) to process text-based prompts, utilizing its natural language understanding and extensive knowledge base to provide rich task-relevant context. Simultaneously, we encode trajectory prompts using a transformer model, capturing structured behavioral patterns within the underlying transition dynamics. These prompts serve as conditional inputs to a context-aware policy-level diffusion model, enabling the RL agent to generalize effectively to unseen tasks. Our experimental results demonstrate that LLMDPD outperforms state-of-the-art offline RL methods on unseen tasks, highlighting its effectiveness in improving generalization and adaptability in diverse settings.

Beyond Static LLM Policies: Imitation-Enhanced Reinforcement Learning for Recommendation

Information Retrieval

Makes movie suggestions faster and smarter.

15 Oct 2025 2

89%

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Computation and Language

Teaches AI to learn and solve problems better.

18 Nov 2025 1

89%

A-LAMP: Agentic LLM-Based Framework for Automated MDP Modeling and Policy Generation

Artificial Intelligence

Teaches computers to learn tasks from simple instructions.

12 Dec 2025 1

View PDF Login to Bookmark

Page Count

12 pages

LLM-Driven Policy Diffusion: Enhancing Generalization in Offline Reinforcement Learning

Teaches robots new jobs from examples.

Technical Abstract

Beyond Static LLM Policies: Imitation-Enhanced Reinforcement Learning for Recommendation

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

A-LAMP: Agentic LLM-Based Framework for Automated MDP Modeling and Policy Generation