Score: 2

CHDP: Cooperative Hybrid Diffusion Policies for Reinforcement Learning in Parameterized Action Space

Published: January 9, 2026 | arXiv ID: 2601.05675v1

By: Bingyi Liu , Jinbo He , Haiyong Shi and more

Potential Business Impact:

Helps robots learn complex tasks faster.

Business Areas:

Collaborative Consumption Collaboration

Hybrid action space, which combines discrete choices and continuous parameters, is prevalent in domains such as robot control and game AI. However, efficiently modeling and optimizing hybrid discrete-continuous action space remains a fundamental challenge, mainly due to limited policy expressiveness and poor scalability in high-dimensional settings. To address this challenge, we view the hybrid action space problem as a fully cooperative game and propose a \textbf{Cooperative Hybrid Diffusion Policies (CHDP)} framework to solve it. CHDP employs two cooperative agents that leverage a discrete and a continuous diffusion policy, respectively. The continuous policy is conditioned on the discrete action's representation, explicitly modeling the dependency between them. This cooperative design allows the diffusion policies to leverage their expressiveness to capture complex distributions in their respective action spaces. To mitigate the update conflicts arising from simultaneous policy updates in this cooperative setting, we employ a sequential update scheme that fosters co-adaptation. Moreover, to improve scalability when learning in high-dimensional discrete action space, we construct a codebook that embeds the action space into a low-dimensional latent space. This mapping enables the discrete policy to learn in a compact, structured space. Finally, we design a Q-function-based guidance mechanism to align the codebook's embeddings with the discrete policy's representation during training. On challenging hybrid action benchmarks, CHDP outperforms the state-of-the-art method by up to $19.3\%$ in success rate.

CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion

CV and Pattern Recognition

Robots learn better by remembering past actions.

17 Jun 2025 1

88%

ADPro: a Test-time Adaptive Diffusion Policy for Robot Manipulation via Manifold and Initial Noise Constraints

Robotics

Robots learn to do tasks faster and better.

8 Aug 2025 0

88%

Hybrid Consistency Policy: Decoupling Multi-Modal Diversity and Real-Time Efficiency in Robotic Manipulation

Robotics

Robots learn to move faster, like humans.

30 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇭🇰 Hong Kong, China

Page Count

9 pages

CHDP: Cooperative Hybrid Diffusion Policies for Reinforcement Learning in Parameterized Action Space

Helps robots learn complex tasks faster.

Technical Abstract

CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion

ADPro: a Test-time Adaptive Diffusion Policy for Robot Manipulation via Manifold and Initial Noise Constraints

Hybrid Consistency Policy: Decoupling Multi-Modal Diversity and Real-Time Efficiency in Robotic Manipulation