Score: 0

You Only Pose Once: A Minimalist's Detection Transformer for Monocular RGB Category-level 9D Multi-Object Pose Estimation

Published: August 20, 2025 | arXiv ID: 2508.14965v1

By: Hakjin Lee, Junghoon Seo, Jaehoon Sim

Potential Business Impact:

Helps robots understand object positions from pictures.

Business Areas:

Image Recognition Data and Analytics, Software

Accurately recovering the full 9-DoF pose of unseen instances within specific categories from a single RGB image remains a core challenge for robotics and automation. Most existing solutions still rely on pseudo-depth, CAD models, or multi-stage cascades that separate 2D detection from pose estimation. Motivated by the need for a simpler, RGB-only alternative that learns directly at the category level, we revisit a longstanding question: Can object detection and 9-DoF pose estimation be unified with high performance, without any additional data? We show that they can with our method, YOPO, a single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. YOPO augments a transformer detector with a lightweight pose head, a bounding-box-conditioned translation module, and a 6D-aware Hungarian matching cost. The model is trained end-to-end only with RGB images and category-level pose labels. Despite its minimalist design, YOPO sets a new state of the art on three benchmarks. On the REAL275 dataset, it achieves 79.6% $\rm{IoU}_{50}$ and 54.1% under the $10^\circ$$10{\rm{cm}}$ metric, surpassing prior RGB-only methods and closing much of the gap to RGB-D systems. The code, models, and additional qualitative results can be found on our project.

Unified Category-Level Object Detection and Pose Estimation from RGB Images using 3D Prototypes

CV and Pattern Recognition

Lets computers see objects in 3D from photos.

4 Aug 2025 2

89%

RCGNet: RGB-based Category-Level 6D Object Pose Estimation with Geometric Guidance

CV and Pattern Recognition

Lets computers guess object position from pictures.

19 Aug 2025 0

88%

Beyond 'Templates': Category-Agnostic Object Pose, Size, and Shape Estimation from a Single View

CV and Pattern Recognition

Helps robots understand and grab any object.

13 Oct 2025 0

View PDF Login to Bookmark

Page Count

8 pages

You Only Pose Once: A Minimalist's Detection Transformer for Monocular RGB Category-level 9D Multi-Object Pose Estimation

Helps robots understand object positions from pictures.

Technical Abstract

Unified Category-Level Object Detection and Pose Estimation from RGB Images using 3D Prototypes

RCGNet: RGB-based Category-Level 6D Object Pose Estimation with Geometric Guidance

Beyond 'Templates': Category-Agnostic Object Pose, Size, and Shape Estimation from a Single View