Score: 2

AECV-Bench: Benchmarking Multimodal Models on Architectural and Engineering Drawings Understanding

Published: January 8, 2026 | arXiv ID: 2601.04819v1

By: Aleksei Kondratenko , Mussie Birhane , Houssame E. Hsain and more

Potential Business Impact:

Computers learn to read building plans better.

Business Areas:

Image Recognition Data and Analytics, Software

AEC drawings encode geometry and semantics through symbols, layout conventions, and dense annotation, yet it remains unclear whether modern multimodal and vision-language models can reliably interpret this graphical language. We present AECV-Bench, a benchmark for evaluating multimodal and vision-language models on realistic AEC artefacts via two complementary use cases: (i) object counting on 120 high-quality floor plans (doors, windows, bedrooms, toilets), and (ii) drawing-grounded document QA spanning 192 question-answer pairs that test text extraction (OCR), instance counting, spatial reasoning, and comparative reasoning over common drawing regions. Object-counting performance is reported using per-field exact-match accuracy and MAPE results, while document-QA performance is reported using overall accuracy and per-category breakdowns with an LLM-as-a-judge scoring pipeline and targeted human adjudication for edge cases. Evaluating a broad set of state-of-the-art models under a unified protocol, we observe a stable capability gradient; OCR and text-centric document QA are strongest (up to 0.95 accuracy), spatial reasoning is moderate, and symbol-centric drawing understanding - especially reliable counting of doors and windows - remains unsolved (often 0.40-0.55 accuracy) with substantial proportional errors. These results suggest that current systems function well as document assistants but lack robust drawing literacy, motivating domain-specific representations and tool-augmented, human-in-the-loop workflows for an efficient AEC automation.

AECBench: A Hierarchical Benchmark for Knowledge Evaluation of Large Language Models in the AEC Field

Computation and Language

Tests if AI can safely design buildings.

23 Sep 2025 0

88%

Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency

CV and Pattern Recognition

Tests if computers can do math with pictures.

24 Apr 2025 1

88%

VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding

CV and Pattern Recognition

Helps computers find answers in any language document.

10 Aug 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

23 pages

AECV-Bench: Benchmarking Multimodal Models on Architectural and Engineering Drawings Understanding

Computers learn to read building plans better.

Technical Abstract

AECBench: A Hierarchical Benchmark for Knowledge Evaluation of Large Language Models in the AEC Field

Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency

VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding