Score: 0

Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations

Published: August 7, 2025 | arXiv ID: 2508.05470v1

By: Li-Chun Lu , Miri Liu , Pin-Chun Lu and more

Potential Business Impact:

Helps computers judge creative ideas more like humans.

We systematically examine, analyze, and compare representative creativity measures--creativity index, perplexity, syntactic templates, and LLM-as-a-Judge--across diverse creative domains, including creative writing, unconventional problem-solving, and research ideation. Our analyses reveal that these metrics exhibit limited consistency, capturing different dimensions of creativity. We highlight key limitations, including the creativity index's focus on lexical diversity, perplexity's sensitivity to model confidence, and syntactic templates' inability to capture conceptual creativity. Additionally, LLM-as-a-Judge shows instability and bias. Our findings underscore the need for more robust, generalizable evaluation frameworks that better align with human judgments of creativity.

CreativityPrism: A Holistic Benchmark for Large Language Model Creativity

Computation and Language

Tests if AI is creative in many ways.

23 Oct 2025 1

90%

Deep Associations, High Creativity: A Simple yet Effective Metric for Evaluating Large Language Models

Computation and Language

Tests AI's imagination like a human.

14 Oct 2025 2

89%

Style Over Story: A Process-Oriented Study of Authorial Creativity in Large Language Models

Computation and Language

AI writing tools prefer style over story.

2 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

15 pages

Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations

Helps computers judge creative ideas more like humans.

Technical Abstract

CreativityPrism: A Holistic Benchmark for Large Language Model Creativity

Deep Associations, High Creativity: A Simple yet Effective Metric for Evaluating Large Language Models

Style Over Story: A Process-Oriented Study of Authorial Creativity in Large Language Models