UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits
By: Keming Ye , Zhipeng Huang , Canmiao Fu and more
Potential Business Impact:
Helps AI understand and fix pictures better.
With the rapid advances of powerful multimodal models such as GPT-4o, Nano Banana, and Seedream 4.0 in Image Editing, the performance gap between closed-source and open-source models is widening, primarily due to the scarcity of large-scale, high-quality training data and comprehensive benchmarks capable of diagnosing model weaknesses across diverse editing behaviors. Existing data construction methods face a scale-quality trade-off: human annotations are high-quality but not scalable, while automated pipelines suffer from error propagation and noise. To address this, we introduce a lightweight data pipeline that replaces multi-toolchains with an end-to-end model and a unified post-verification stage. For scalable quality control, we train a 7B dual-task expert model, \textbf{Qwen-Verify}, for efficient failure detection and instruction recaptioning. This pipeline yields \textbf{UnicEdit-10M}, a 10M-scale dataset spanning diverse basic and complex editing tasks. We also propose \textbf{UnicBench}, a general benchmark that extends beyond basic edits to explicitly assess spatial and knowledge-driven reasoning. To enable fine-grained diagnosis, we introduce novel metrics, including \textit{Non-edit Consistency} and \textit{Reasoning Accuracy}. Our analysis of mainstream models on UnicBench reveals their limitations and provides clear directions for future research.
Similar Papers
UniREditBench: A Unified Reasoning-based Image Editing Benchmark
CV and Pattern Recognition
Makes pictures edit better with more thinking.
UniEdit: A Unified Knowledge Editing Benchmark for Large Language Models
Computation and Language
Makes AI smarter and more truthful everywhere.
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
CV and Pattern Recognition
Teaches computers to change pictures with words.