Score: 0

RealD$^2$iff: Bridging Real-World Gap in Robot Manipulation via Depth Diffusion

Published: November 27, 2025 | arXiv ID: 2511.22505v1

By: Xiujian Liang , Jiacheng Liu , Mingyang Sun and more

Potential Business Impact:

Robots learn to see real world from fake images.

Business Areas:

Virtual Reality Hardware, Software

Robot manipulation in the real world is fundamentally constrained by the visual sim2real gap, where depth observations collected in simulation fail to reflect the complex noise patterns inherent to real sensors. In this work, inspired by the denoising capability of diffusion models, we invert the conventional perspective and propose a clean-to-noisy paradigm that learns to synthesize noisy depth, thereby bridging the visual sim2real gap through purely simulation-driven robotic learning. Building on this idea, we introduce RealD$^2$iff, a hierarchical coarse-to-fine diffusion framework that decomposes depth noise into global structural distortions and fine-grained local perturbations. To enable progressive learning of these components, we further develop two complementary strategies: Frequency-Guided Supervision (FGS) for global structure modeling and Discrepancy-Guided Optimization (DGO) for localized refinement. To integrate RealD$^2$iff seamlessly into imitation learning, we construct a pipeline that spans six stages. We provide comprehensive empirical and experimental validation demonstrating the effectiveness of this paradigm. RealD$^2$iff enables two key applications: (1) generating real-world-like depth to construct clean-noisy paired datasets without manual sensor data collection. (2) Achieving zero-shot sim2real robot manipulation, substantially improving real-world performance without additional fine-tuning.

DiffuDepGrasp: Diffusion-based Depth Noise Modeling Empowers Sim2Real Robotic Grasping

Robotics

Robots learn to grab things perfectly, even with bad camera views.

17 Nov 2025 0

88%

R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation

CV and Pattern Recognition

Makes self-driving cars test in realistic fake worlds.

9 Jun 2025 0

87%

High-Fidelity Digital Twins for Bridging the Sim2Real Gap in LiDAR-Based ITS Perception

CV and Pattern Recognition

Makes self-driving cars see better in real life.

3 Sep 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

11 pages

RealD$^2$iff: Bridging Real-World Gap in Robot Manipulation via Depth Diffusion

Robots learn to see real world from fake images.

Technical Abstract

DiffuDepGrasp: Diffusion-based Depth Noise Modeling Empowers Sim2Real Robotic Grasping

R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation

High-Fidelity Digital Twins for Bridging the Sim2Real Gap in LiDAR-Based ITS Perception