Score: 0

Autonomous Construction-Site Safety Inspection Using Mobile Robots: A Multilayer VLM-LLM Pipeline

Published: December 16, 2025 | arXiv ID: 2512.13974v1

By: Hossein Naderi , Alireza Shojaei , Philip Agee and more

Construction safety inspection remains mostly manual, and automated approaches still rely on task-specific datasets that are hard to maintain in fast-changing construction environments due to frequent retraining. Meanwhile, field inspection with robots still depends on human teleoperation and manual reporting, which are labor-intensive. This paper aims to connect what a robot sees during autonomous navigation to the safety rules that are common in construction sites, automatically generating a safety inspection report. To this end, we proposed a multi-layer framework with two main modules: robotics and AI. On the robotics side, SLAM and autonomous navigation provide repeatable coverage and targeted revisits via waypoints. On AI side, a Vision Language Model (VLM)-based layer produces scene descriptions; a retrieval component powered grounds those descriptions in OSHA and site policies; Another VLM-based layer assesses the safety situation based on rules; and finally Large Language Model (LLM) layer generates safety reports based on previous outputs. The framework is validated with a proof-of-concept implementation and evaluated in a lab environment that simulates common hazards across three scenarios. Results show high recall with competitive precision compared to state-of-the-art closed-source models. This paper contributes a transparent, generalizable pipeline that moves beyond black-box models by exposing intermediate artifacts from each layer and keeping the human in the loop. This work provides a foundation for future extensions to additional tasks and settings within and beyond construction context.

Using Vision Language Models for Safety Hazard Identification in Construction

CV and Pattern Recognition

Finds hidden dangers on building sites.

12 Apr 2025 1

90%

Automating construction safety inspections using a multi-modal vision-language RAG framework

CV and Pattern Recognition

Helps build sites automatically check for safety.

5 Oct 2025 0

90%

Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?

CV and Pattern Recognition

Helps computers spot building site dangers.

14 Aug 2025 3

View PDF Login to Bookmark

Autonomous Construction-Site Safety Inspection Using Mobile Robots: A Multilayer VLM-LLM Pipeline

Technical Abstract

Using Vision Language Models for Safety Hazard Identification in Construction

Automating construction safety inspections using a multi-modal vision-language RAG framework

Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?