Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach
By: Ruizhe Li , Chiwei Zhu , Benfeng Xu and more
Potential Business Impact:
Tests stories to see if they are creative.
Creative writing is a key capability of Large Language Models (LLMs), with potential applications in literature, storytelling, and various creative domains. However, evaluating the creativity of machine-generated texts remains a significant challenge, as existing methods either rely on costly manual annotations or fail to align closely with human assessments. In this paper, we propose an effective automated evaluation method based on the Torrance Test of Creative Writing (TTCW), which evaluates creativity as product. Our method employs a reference-based Likert-style approach, scoring generated creative texts relative to high-quality reference texts across various tests. Experimental results demonstrate that our method significantly improves the alignment between LLM evaluations and human assessments, achieving a pairwise accuracy of 0.75 (+15\%).
Similar Papers
A Comparative Approach to Assessing Linguistic Creativity of Large Language Models and Humans
Computation and Language
Computers invent new words better than people.
Evaluating the Creativity of LLMs in Persian Literary Text Generation
Computation and Language
Computers write creative Persian stories.
Style Over Story: A Process-Oriented Study of Authorial Creativity in Large Language Models
Computation and Language
AI writing tools prefer style over story.