Score: 0

Towards Open-Vocabulary Industrial Defect Understanding with a Large-Scale Multimodal Dataset

Published: December 30, 2025 | arXiv ID: 2512.24160v1

By: TsaiChing Ni, ZhenQi Chen, YuanFu Yang

We present IMDD-1M, the first large-scale Industrial Multimodal Defect Dataset comprising 1,000,000 aligned image-text pairs, designed to advance multimodal learning for manufacturing and quality inspection. IMDD-1M contains high-resolution real-world defects spanning over 60 material categories and more than 400 defect types, each accompanied by expert-verified annotations and fine-grained textual descriptions detailing defect location, severity, and contextual attributes. This dataset enables a wide spectrum of applications, including classification, segmentation, retrieval, captioning, and generative modeling. Building upon IMDD-1M, we train a diffusion-based vision-language foundation model from scratch, specifically tailored for industrial scenarios. The model serves as a generalizable foundation that can be efficiently adapted to specialized domains through lightweight fine-tuning. With less than 5% of the task-specific data required by dedicated expert models, it achieves comparable performance, highlighting the potential of data-efficient foundation model adaptation for industrial inspection and generation, paving the way for scalable, domain-adaptive, and knowledge-grounded manufacturing intelligence.

Resilient Multimodal Industrial Surface Defect Detection with Uncertain Sensors Availability

CV and Pattern Recognition

Finds factory flaws even with broken cameras.

3 Sep 2025 2

88%

A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis

CV and Pattern Recognition

Helps farmers spot plant sickness with pictures and words.

10 Mar 2025 2

88%

A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction

CV and Pattern Recognition

Helps robots understand what workers are doing.

10 Jan 2025 1

View PDF Login to Bookmark

Towards Open-Vocabulary Industrial Defect Understanding with a Large-Scale Multimodal Dataset

Technical Abstract

Resilient Multimodal Industrial Surface Defect Detection with Uncertain Sensors Availability

A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis

A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction