Score: 0

KidsArtBench: Multi-Dimensional Children's Art Evaluation with Attribute-Aware MLLMs

Published: December 14, 2025 | arXiv ID: 2512.12503v1

By: Mingrui Ye , Chanjin Zheng , Zengyi Yu and more

Multimodal Large Language Models (MLLMs) show remarkable progress across many visual-language tasks; however, their capacity to evaluate artistic expression remains limited. Aesthetic concepts are inherently abstract and open-ended, and multimodal artwork annotations are scarce. We introduce KidsArtBench, a new benchmark of over 1k children's artworks (ages 5-15) annotated by 12 expert educators across 9 rubric-aligned dimensions, together with expert comments for feedback. Unlike prior aesthetic datasets that provide single scalar scores on adult imagery, KidsArtBench targets children's artwork and pairs multi-dimensional annotations with comment supervision to enable both ordinal assessment and formative feedback. Building on this resource, we propose an attribute-specific multi-LoRA approach, where each attribute corresponds to a distinct evaluation dimension (e.g., Realism, Imagination) in the scoring rubric, with Regression-Aware Fine-Tuning (RAFT) to align predictions with ordinal scales. On Qwen2.5-VL-7B, our method increases correlation from 0.468 to 0.653, with the largest gains on perceptual dimensions and narrowed gaps on higher-order attributes. These results show that educator-aligned supervision and attribute-aware training yield pedagogically meaningful evaluations and establish a rigorous testbed for sustained progress in educational AI. We release data and code with ethics documentation.

MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams

Artificial Intelligence

Tests AI on school subjects to make it smarter.

9 Aug 2025 1

89%

AesBiasBench: Evaluating Bias and Alignment in Multimodal Language Models for Personalized Image Aesthetic Assessment

Computation and Language

Finds if AI unfairly judges pictures based on who made them.

15 Sep 2025 0

89%

VQArt-Bench: A semantically rich VQA Benchmark for Art and Cultural Heritage

CV and Pattern Recognition

Tests if computers truly understand art.

14 Oct 2025 2

View PDF Login to Bookmark

KidsArtBench: Multi-Dimensional Children's Art Evaluation with Attribute-Aware MLLMs

Technical Abstract

MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams

AesBiasBench: Evaluating Bias and Alignment in Multimodal Language Models for Personalized Image Aesthetic Assessment

VQArt-Bench: A semantically rich VQA Benchmark for Art and Cultural Heritage