Score: 0

GraspView: Active Perception Scoring and Best-View Optimization for Robotic Grasping in Cluttered Environments

Published: November 6, 2025 | arXiv ID: 2511.04199v1

By: Shenglin Wang , Mingtong Dai , Jingxuan Su and more

Potential Business Impact:

Robots grab things better using only pictures.

Business Areas:

Image Recognition Data and Analytics, Software

Robotic grasping is a fundamental capability for autonomous manipulation, yet remains highly challenging in cluttered environments where occlusion, poor perception quality, and inconsistent 3D reconstructions often lead to unstable or failed grasps. Conventional pipelines have widely relied on RGB-D cameras to provide geometric information, which fail on transparent or glossy objects and degrade at close range. We present GraspView, an RGB-only robotic grasping pipeline that achieves accurate manipulation in cluttered environments without depth sensors. Our framework integrates three key components: (i) global perception scene reconstruction, which provides locally consistent, up-to-scale geometry from a single RGB view and fuses multi-view projections into a coherent global 3D scene; (ii) a render-and-score active perception strategy, which dynamically selects next-best-views to reveal occluded regions; and (iii) an online metric alignment module that calibrates VGGT predictions against robot kinematics to ensure physical scale consistency. Building on these tailor-designed modules, GraspView performs best-view global grasping, fusing multi-view reconstructions and leveraging GraspNet for robust execution. Experiments on diverse tabletop objects demonstrate that GraspView significantly outperforms both RGB-D and single-view RGB baselines, especially under heavy occlusion, near-field sensing, and with transparent objects. These results highlight GraspView as a practical and versatile alternative to RGB-D pipelines, enabling reliable grasping in unstructured real-world environments.

ActiveGrasp: Information-Guided Active Grasping with Calibrated Energy-based Model

Robotics

Helps robots grab things in messy places.

16 Nov 2025 1

89%

RoboRetriever: Single-Camera Robot Object Retrieval via Active and Interactive Perception with Dynamic Scene Graph

Robotics

Robot finds things using one camera and words.

18 Aug 2025 0

89%

VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility

Robotics

Helps robots grab things even when hidden.

16 Mar 2025 1

View PDF Login to Bookmark

Page Count

9 pages

GraspView: Active Perception Scoring and Best-View Optimization for Robotic Grasping in Cluttered Environments

Robots grab things better using only pictures.

Technical Abstract

ActiveGrasp: Information-Guided Active Grasping with Calibrated Energy-based Model

RoboRetriever: Single-Camera Robot Object Retrieval via Active and Interactive Perception with Dynamic Scene Graph

VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility