Score: 0

NVSPolicy: Adaptive Novel-View Synthesis for Generalizable Language-Conditioned Policy Learning

Published: May 15, 2025 | arXiv ID: 2505.10359v1

By: Le Shi , Yifei Shi , Xin Xu and more

Potential Business Impact:

Robots see more, learn faster, and do tasks better.

Business Areas:

Autonomous Vehicles Transportation

Recent advances in deep generative models demonstrate unprecedented zero-shot generalization capabilities, offering great potential for robot manipulation in unstructured environments. Given a partial observation of a scene, deep generative models could generate the unseen regions and therefore provide more context, which enhances the capability of robots to generalize across unseen environments. However, due to the visual artifacts in generated images and inefficient integration of multi-modal features in policy learning, this direction remains an open challenge. We introduce NVSPolicy, a generalizable language-conditioned policy learning method that couples an adaptive novel-view synthesis module with a hierarchical policy network. Given an input image, NVSPolicy dynamically selects an informative viewpoint and synthesizes an adaptive novel-view image to enrich the visual context. To mitigate the impact of the imperfect synthesized images, we adopt a cycle-consistent VAE mechanism that disentangles the visual features into the semantic feature and the remaining feature. The two features are then fed into the hierarchical policy network respectively: the semantic feature informs the high-level meta-skill selection, and the remaining feature guides low-level action estimation. Moreover, we propose several practical mechanisms to make the proposed method efficient. Extensive experiments on CALVIN demonstrate the state-of-the-art performance of our method. Specifically, it achieves an average success rate of 90.4\% across all tasks, greatly outperforming the recent methods. Ablation studies confirm the significance of our adaptive novel-view synthesis paradigm. In addition, we evaluate NVSPolicy on a real-world robotic platform to demonstrate its practical applicability.

Zero-Shot Visual Generalization in Robot Manipulation

Robotics

Robots learn to do tasks in new places.

16 May 2025 1

88%

Imagination at Inference: Synthesizing In-Hand Views for Robust Visuomotor Policy Inference

Robotics

Robots can "see" better without extra cameras.

19 Sep 2025 0

88%

AR-1-to-3: Single Image to Consistent 3D Object Generation via Next-View Prediction

CV and Pattern Recognition

Creates realistic 3D objects from a single picture.

17 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

10 pages

NVSPolicy: Adaptive Novel-View Synthesis for Generalizable Language-Conditioned Policy Learning

Robots see more, learn faster, and do tasks better.

Technical Abstract

Zero-Shot Visual Generalization in Robot Manipulation

Imagination at Inference: Synthesizing In-Hand Views for Robust Visuomotor Policy Inference

AR-1-to-3: Single Image to Consistent 3D Object Generation via Next-View Prediction