Score: 2

Oogiri-Master: Benchmarking Humor Understanding via Oogiri

Published: December 25, 2025 | arXiv ID: 2512.21494v1

By: Soichiro Murakami , Hidetaka Kamigaito , Hiroya Takamura and more

Potential Business Impact:

Teaches computers to tell funnier jokes.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Humor is a salient testbed for human-like creative thinking in large language models (LLMs). We study humor using the Japanese creative response game Oogiri, in which participants produce witty responses to a given prompt, and ask the following research question: What makes such responses funny to humans? Previous work has offered only limited reliable means to answer this question. Existing datasets contain few candidate responses per prompt, expose popularity signals during ratings, and lack objective and comparable metrics for funniness. Thus, we introduce Oogiri-Master and Oogiri-Corpus, which are a benchmark and dataset designed to enable rigorous evaluation of humor understanding in LLMs. Each prompt is paired with approximately 100 diverse candidate responses, and funniness is rated independently by approximately 100 human judges without access to others' ratings, reducing popularity bias and enabling robust aggregation. Using Oogiri-Corpus, we conduct a quantitative analysis of the linguistic factors associated with funniness, such as text length, ambiguity, and incongruity resolution, and derive objective metrics for predicting human judgments. Subsequently, we benchmark a range of LLMs and human baselines in Oogiri-Master, demonstrating that state-of-the-art models approach human performance and that insight-augmented prompting improves the model performance. Our results provide a principled basis for evaluating and advancing humor understanding in LLMs.

Assessing the Capabilities of LLMs in Humor:A Multi-dimensional Analysis of Oogiri Generation and Evaluation

Computation and Language

Makes robots understand and create funnier jokes.

12 Nov 2025 1

86%

A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models

Artificial Intelligence

Tests computers' funny and creative thinking.

25 Jan 2025 0

86%

From Punchlines to Predictions: A Metric to Assess LLM Performance in Identifying Humor in Stand-Up Comedy

Computation and Language

AI can now find jokes better than people.

12 Apr 2025 1

View PDF Login to Bookmark

Country of Origin

🇯🇵 Japan

Repos / Data Links

github.com github.com

Page Count

12 pages

Oogiri-Master: Benchmarking Humor Understanding via Oogiri

Teaches computers to tell funnier jokes.

Technical Abstract

Assessing the Capabilities of LLMs in Humor:A Multi-dimensional Analysis of Oogiri Generation and Evaluation

A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models

From Punchlines to Predictions: A Metric to Assess LLM Performance in Identifying Humor in Stand-Up Comedy