Score: 2

WritingBench: A Comprehensive Benchmark for Generative Writing

Published: March 7, 2025 | arXiv ID: 2503.05244v3

By: Yuning Wu , Jiahao Mei , Ming Yan and more

BigTech Affiliations: Alibaba

Potential Business Impact:

Tests how well computers write different kinds of stories.

Business Areas:
Blogging Platforms Content and Publishing, Media and Entertainment

Recent advancements in large language models (LLMs) have significantly enhanced text generation capabilities, yet evaluating their performance in generative writing remains a challenge. Existing benchmarks primarily focus on generic text generation or limited in writing tasks, failing to capture the diverse requirements of high-quality written contents across various domains. To bridge this gap, we present WritingBench, a comprehensive benchmark designed to evaluate LLMs across 6 core writing domains and 100 subdomains, encompassing creative, persuasive, informative, and technical writing. We further propose a query-dependent evaluation framework that empowers LLMs to dynamically generate instance-specific assessment criteria. This framework is complemented by a fine-tuned critic model for criteria-aware scoring, enabling evaluations in style, format and length. The framework's validity is further demonstrated by its data curation capability, which enables 7B-parameter models to approach state-of-the-art (SOTA) performance. We open-source the benchmark, along with evaluation tools and modular framework components, to advance the development of LLMs in writing.

Country of Origin
🇨🇳 China

Page Count
18 pages

Category
Computer Science:
Artificial Intelligence