Score: 2

Unified Category-Level Object Detection and Pose Estimation from RGB Images using 3D Prototypes

Published: August 4, 2025 | arXiv ID: 2508.02157v1

By: Tom Fischer, Xiaojie Zhang, Eddy Ilg

Potential Business Impact:

Lets computers see objects in 3D from photos.

Recognizing objects in images is a fundamental problem in computer vision. Although detecting objects in 2D images is common, many applications require determining their pose in 3D space. Traditional category-level methods rely on RGB-D inputs, which may not always be available, or employ two-stage approaches that use separate models and representations for detection and pose estimation. For the first time, we introduce a unified model that integrates detection and pose estimation into a single framework for RGB images by leveraging neural mesh models with learned features and multi-model RANSAC. Our approach achieves state-of-the-art results for RGB category-level pose estimation on REAL275, improving on the current state-of-the-art by 22.9% averaged across all scale-agnostic metrics. Finally, we demonstrate that our unified method exhibits greater robustness compared to single-stage baselines. Our code and models are available at https://github.com/Fischer-Tom/unified-detection-and-pose-estimation.

RCGNet: RGB-based Category-Level 6D Object Pose Estimation with Geometric Guidance

CV and Pattern Recognition

Lets computers guess object position from pictures.

19 Aug 2025 0

90%

Beyond 'Templates': Category-Agnostic Object Pose, Size, and Shape Estimation from a Single View

CV and Pattern Recognition

Helps robots understand and grab any object.

13 Oct 2025 0

90%

Universal Features Guided Zero-Shot Category-Level Object Pose Estimation

CV and Pattern Recognition

Teaches robots to grab new things they've never seen.

6 Jan 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

18 pages

Unified Category-Level Object Detection and Pose Estimation from RGB Images using 3D Prototypes

Lets computers see objects in 3D from photos.

Technical Abstract

RCGNet: RGB-based Category-Level 6D Object Pose Estimation with Geometric Guidance

Beyond 'Templates': Category-Agnostic Object Pose, Size, and Shape Estimation from a Single View

Universal Features Guided Zero-Shot Category-Level Object Pose Estimation