Fine-Grained Human Pose Editing Assessment via Layer-Selective MLLMs
By: Ningyu Sun , Zhaolin Cai , Zitong Xu and more
Text-guided human pose editing has gained significant traction in AIGC applications. However,it remains plagued by structural anomalies and generative artifacts. Existing evaluation metrics often isolate authenticity detection from quality assessment, failing to provide fine-grained insights into pose-specific inconsistencies. To address these limitations, we introduce HPE-Bench, a specialized benchmark comprising 1,700 standardized samples from 17 state-of-the-art editing models, offering both authenticity labels and multi-dimensional quality scores. Furthermore, we propose a unified framework based on layer-selective multimodal large language models (MLLMs). By employing contrastive LoRA tuning and a novel layer sensitivity analysis (LSA) mechanism, we identify the optimal feature layer for pose evaluation. Our framework achieves superior performance in both authenticity detection and multi-dimensional quality regression, effectively bridging the gap between forensic detection and quality assessment.
Similar Papers
HMVLM: Human Motion-Vision-Lanuage Model via MoE LoRA
CV and Pattern Recognition
Teaches computers to understand and create human movement.
PoseLLM: Enhancing Language-Guided Human Pose Estimation with MLP Alignment
CV and Pattern Recognition
Helps computers understand body poses from pictures.
LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs
CV and Pattern Recognition
Helps computers judge edited pictures better.