Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation
By: Xintong Duan , Yutong He , Fahim Tajwar and more
Potential Business Impact:
Makes AI learn faster and better for games.
Although diffusion models have achieved strong results in decision-making tasks, their slow inference speed remains a key limitation. While the consistency model offers a potential solution, its applications to decision-making often struggle with suboptimal demonstrations or rely on complex concurrent training of multiple networks. In this work, we propose a novel approach to consistency distillation for offline reinforcement learning that directly incorporates reward optimization into the distillation process. Our method enables single-step generation while maintaining higher performance and simpler training. Empirical evaluations on the Gym MuJoCo benchmarks and long horizon planning demonstrate that our approach can achieve an 8.7% improvement over previous state-of-the-art while offering up to 142x speedup over diffusion counterparts in inference time.
Similar Papers
ROCM: RLHF on consistency models
Machine Learning (CS)
Makes AI create better things faster.
BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning
Machine Learning (CS)
Helps robots learn better from past experiences.
Intra-Trajectory Consistency for Reward Modeling
Machine Learning (CS)
Teaches AI to judge answers better.