GenDet: Painting Colored Bounding Boxes on Images via Diffusion Model for Object Detection
By: Chen Min , Chengyang Li , Fanjie Kong and more
This paper presents GenDet, a novel framework that redefines object detection as an image generation task. In contrast to traditional approaches, GenDet adopts a pioneering approach by leveraging generative modeling: it conditions on the input image and directly generates bounding boxes with semantic annotations in the original image space. GenDet establishes a conditional generation architecture built upon the large-scale pre-trained Stable Diffusion model, formulating the detection task as semantic constraints within the latent space. It enables precise control over bounding box positions and category attributes, while preserving the flexibility of the generative model. This novel methodology effectively bridges the gap between generative models and discriminative tasks, providing a fresh perspective for constructing unified visual understanding systems. Systematic experiments demonstrate that GenDet achieves competitive accuracy compared to discriminative detectors, while retaining the flexibility characteristic of generative methods.
Similar Papers
FlowDet: Unifying Object Detection and Generative Transport Flows
CV and Pattern Recognition
Finds objects in pictures much faster.
RTGen: Real-Time Generative Detection Transformer
CV and Pattern Recognition
Finds objects and names them faster than before.
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
CV and Pattern Recognition
Makes self-driving cars see better in new places.