Score: 1

What Makes a Good Generated Image? Investigating Human and Multimodal LLM Image Preference Alignment

Published: September 16, 2025 | arXiv ID: 2509.12750v1

By: Rishab Parthasarathy, Jasmine Collins, Cory Stephenson

BigTech Affiliations: Databricks Massachusetts Institute of Technology

Potential Business Impact:

Helps AI understand what makes pictures look good.

Business Areas:

Visual Search Internet Services

Automated evaluation of generative text-to-image models remains a challenging problem. Recent works have proposed using multimodal LLMs to judge the quality of images, but these works offer little insight into how multimodal LLMs make use of concepts relevant to humans, such as image style or composition, to generate their overall assessment. In this work, we study what attributes of an image--specifically aesthetics, lack of artifacts, anatomical accuracy, compositional correctness, object adherence, and style--are important for both LLMs and humans to make judgments on image quality. We first curate a dataset of human preferences using synthetically generated image pairs. We use inter-task correlation between each pair of image quality attributes to understand which attributes are related in making human judgments. Repeating the same analysis with LLMs, we find that the relationships between image quality attributes are much weaker. Finally, we study individual image quality attributes by generating synthetic datasets with a high degree of control for each axis. Humans are able to easily judge the quality of an image with respect to all of the specific image quality attributes (e.g. high vs. low aesthetic image), however we find that some attributes, such as anatomical accuracy, are much more difficult for multimodal LLMs to learn to judge. Taken together, these findings reveal interesting differences between how humans and multimodal LLMs perceive images.

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images

CV and Pattern Recognition

Makes AI understand how pictures make people feel.

27 Nov 2025 0

89%

MLLM as a UI Judge: Benchmarking Multimodal LLMs for Predicting Human Perception of User Interfaces

Human-Computer Interaction

AI helps designers pick the best app looks.

9 Oct 2025 0

89%

Image Aesthetic Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance

CV and Pattern Recognition

Teaches computers to judge if pictures look good.

13 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

27 pages

What Makes a Good Generated Image? Investigating Human and Multimodal LLM Image Preference Alignment

Helps AI understand what makes pictures look good.

Technical Abstract

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images

MLLM as a UI Judge: Benchmarking Multimodal LLMs for Predicting Human Perception of User Interfaces

Image Aesthetic Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance