Score: 0

BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning

Published: June 6, 2025 | arXiv ID: 2506.05762v2

By: Yunpeng Qing , Shuo Chen , Yixiao Chi and more

Potential Business Impact:

Helps robots learn better from past experiences.

Business Areas:

A/B Testing Data and Analytics

Recent advances in offline Reinforcement Learning (RL) have proven that effective policy learning can benefit from imposing conservative constraints on pre-collected datasets. However, such static datasets often exhibit distribution bias, resulting in limited generalizability. To address this limitation, a straightforward solution is data augmentation (DA), which leverages generative models to enrich data distribution. Despite the promising results, current DA techniques focus solely on reconstructing future trajectories from given states, while ignoring the exploration of history transitions that reach them. This single-direction paradigm inevitably hinders the discovery of diverse behavior patterns, especially those leading to critical states that may have yielded high-reward outcomes. In this work, we introduce Bidirectional Trajectory Diffusion (BiTrajDiff), a novel DA framework for offline RL that models both future and history trajectories from any intermediate states. Specifically, we decompose the trajectory generation task into two independent yet complementary diffusion processes: one generating forward trajectories to predict future dynamics, and the other generating backward trajectories to trace essential history transitions.BiTrajDiff can efficiently leverage critical states as anchors to expand into potentially valuable yet underexplored regions of the state space, thereby facilitating dataset diversity. Extensive experiments on the D4RL benchmark suite demonstrate that BiTrajDiff achieves superior performance compared to other advanced DA methods across various offline RL backbones.

DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions

Machine Learning (CS)

Teaches robots to learn from past experiences.

23 Sep 2025 0

88%

Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation

Machine Learning (CS)

Makes AI learn faster and better for games.

9 Jun 2025 1

88%

TransDiffuser: Diverse Trajectory Generation with Decorrelated Multi-modal Representation for End-to-end Autonomous Driving

Robotics

Helps self-driving cars plan safer, varied routes.

14 May 2025 1

View PDF Login to Bookmark

Page Count

10 pages

BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning

Helps robots learn better from past experiences.

Technical Abstract

DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions

Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation

TransDiffuser: Diverse Trajectory Generation with Decorrelated Multi-modal Representation for End-to-end Autonomous Driving