Evaluation Framework for AI Creativity: A Case Study Based on Story Generation
By: Pharath Sathya, Yin Jou Huang, Fei Cheng
Potential Business Impact:
Helps AI write stories humans find truly creative.
Evaluating creative text generation remains a challenge because existing reference-based metrics fail to capture the subjective nature of creativity. We propose a structured evaluation framework for AI story generation comprising four components (Novelty, Value, Adherence, and Resonance) and eleven sub-components. Using controlled story generation via ``Spike Prompting'' and a crowdsourced study of 115 readers, we examine how different creative components shape both immediate and reflective human creativity judgments. Our findings show that creativity is evaluated hierarchically rather than cumulatively, with different dimensions becoming salient at different stages of judgment, and that reflective evaluation substantially alters both ratings and inter-rater agreement. Together, these results support the effectiveness of our framework in revealing dimensions of creativity that are obscured by reference-based evaluation.
Similar Papers
Evaluating Quality of Gaming Narratives Co-created with AI
Artificial Intelligence
Helps games tell better stories with AI.
A Study on the Framework for Evaluating the Ethics and Trustworthiness of Generative AI
Computers and Society
Makes AI safer and more trustworthy for everyone.
Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach
Computation and Language
Tests stories to see if they are creative.