I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing Models
By: Juntong Wang , Jiarui Wang , Huiyu Duan and more
Potential Business Impact:
Tests AI image editing better, faster, and more fairly.
Image editing models are advancing rapidly, yet comprehensive evaluation remains a significant challenge. Existing image editing benchmarks generally suffer from limited task scopes, insufficient evaluation dimensions, and heavy reliance on manual annotations, which significantly constrain their scalability and practical applicability. To address this, we propose \textbf{I2I-Bench}, a comprehensive benchmark for image-to-image editing models, which features (i) diverse tasks, encompassing 10 task categories across both single-image and multi-image editing tasks, (ii) comprehensive evaluation dimensions, including 30 decoupled and fine-grained evaluation dimensions with automated hybrid evaluation methods that integrate specialized tools and large multimodal models (LMMs), and (iii) rigorous alignment validation, justifying the consistency between our benchmark evaluations and human preferences. Using I2I-Bench, we benchmark numerous mainstream image editing models, investigating the gaps and trade-offs between editing models across various dimensions. We will open-source all components of I2I-Bench to facilitate future research.
Similar Papers
GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing
CV and Pattern Recognition
Tests if computer image edits match words.
IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment
CV and Pattern Recognition
Tests how well AI edits videos from text.
UniREditBench: A Unified Reasoning-based Image Editing Benchmark
CV and Pattern Recognition
Makes pictures edit better with more thinking.