Score: 1

DINO-YOLO: Self-Supervised Pre-training for Data-Efficient Object Detection in Civil Engineering Applications

Published: October 29, 2025 | arXiv ID: 2510.25140v1

By: Malaisree P , Youwai S , Kitkobsin T and more

Potential Business Impact:

Finds cracks and safety gear in construction photos.

Business Areas:

Image Recognition Data and Analytics, Software

Object detection in civil engineering applications is constrained by limited annotated data in specialized domains. We introduce DINO-YOLO, a hybrid architecture combining YOLOv12 with DINOv3 self-supervised vision transformers for data-efficient detection. DINOv3 features are strategically integrated at two locations: input preprocessing (P0) and mid-backbone enhancement (P3). Experimental validation demonstrates substantial improvements: Tunnel Segment Crack detection (648 images) achieves 12.4% improvement, Construction PPE (1K images) gains 13.7%, and KITTI (7K images) shows 88.6% improvement, while maintaining real-time inference (30-47 FPS). Systematic ablation across five YOLO scales and nine DINOv3 variants reveals that Medium-scale architectures achieve optimal performance with DualP0P3 integration (55.77% mAP@0.5), while Small-scale requires Triple Integration (53.63%). The 2-4x inference overhead (21-33ms versus 8-16ms baseline) remains acceptable for field deployment on NVIDIA RTX 5090. DINO-YOLO establishes state-of-the-art performance for civil engineering datasets (<10K images) while preserving computational efficiency, providing practical solutions for construction safety monitoring and infrastructure inspection in data-constrained environments.

DINOv3

CV and Pattern Recognition

Teaches computers to see and understand images better.

13 Aug 2025 2

89%

RCDINO: Enhancing Radar-Camera 3D Object Detection with DINOv2 Semantic Features

CV and Pattern Recognition

Helps cars "see" better with cameras and radar.

21 Aug 2025 2

88%

Efficient License Plate Recognition via Pseudo-Labeled Supervision with Grounding DINO and YOLOv8

CV and Pattern Recognition

Reads license plates better, even in bad weather.

28 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇹🇭 Thailand

Page Count

26 pages

DINO-YOLO: Self-Supervised Pre-training for Data-Efficient Object Detection in Civil Engineering Applications

Finds cracks and safety gear in construction photos.

Technical Abstract

DINOv3

RCDINO: Enhancing Radar-Camera 3D Object Detection with DINOv2 Semantic Features

Efficient License Plate Recognition via Pseudo-Labeled Supervision with Grounding DINO and YOLOv8