Score: 0

ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning

Published: December 11, 2025 | arXiv ID: 2512.10946v1

By: Wendi Chen , Han Xue , Yi Wang and more

Potential Business Impact:

Robots learn to touch and move objects precisely.

Business Areas:

Robotics Hardware, Science and Engineering, Software

Human-level contact-rich manipulation relies on the distinct roles of two key modalities: vision provides spatially rich but temporally slow global context, while force sensing captures rapid, high-frequency local contact dynamics. Integrating these signals is challenging due to their fundamental frequency and informational disparities. In this work, we propose ImplicitRDP, a unified end-to-end visual-force diffusion policy that integrates visual planning and reactive force control within a single network. We introduce Structural Slow-Fast Learning, a mechanism utilizing causal attention to simultaneously process asynchronous visual and force tokens, allowing the policy to perform closed-loop adjustments at the force frequency while maintaining the temporal coherence of action chunks. Furthermore, to mitigate modality collapse where end-to-end models fail to adjust the weights across different modalities, we propose Virtual-target-based Representation Regularization. This auxiliary objective maps force feedback into the same space as the action, providing a stronger, physics-grounded learning signal than raw force prediction. Extensive experiments on contact-rich tasks demonstrate that ImplicitRDP significantly outperforms both vision-only and hierarchical baselines, achieving superior reactivity and success rates with a streamlined training pipeline. Code and videos will be publicly available at https://implicit-rdp.github.io.

Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation

Robotics

Robots learn to touch and react like humans.

4 Mar 2025 1

88%

3D Flow Diffusion Policy: Visuomotor Policy Learning via Generating Flow in 3D Space

Robotics

Robots learn to grab and move things better.

23 Sep 2025 1

88%

Unified Multimodal Diffusion Forcing for Forceful Manipulation

Robotics

Teaches robots to learn from seeing, doing, and feeling.

6 Nov 2025 0

View PDF Login to Bookmark

Page Count

9 pages

ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning

Robots learn to touch and move objects precisely.

Technical Abstract

Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation

3D Flow Diffusion Policy: Visuomotor Policy Learning via Generating Flow in 3D Space

Unified Multimodal Diffusion Forcing for Forceful Manipulation