Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics
By: Yuriel Ryan , Rui Yang Tan , Kenny Tsu Wei Choo and more
Potential Business Impact:
Helps computers understand funny comic stories.
Understanding humor is a core aspect of social intelligence, yet it remains a significant challenge for Large Multimodal Models (LMMs). We introduce PixelHumor, a benchmark dataset of 2,800 annotated multi-panel comics designed to evaluate LMMs' ability to interpret multimodal humor and recognize narrative sequences. Experiments with state-of-the-art LMMs reveal substantial gaps: for instance, top models achieve only 61% accuracy in panel sequencing, far below human performance. This underscores critical limitations in current models' integration of visual and textual cues for coherent narrative and humor understanding. By providing a rigorous framework for evaluating multimodal contextual and narrative reasoning, PixelHumor aims to drive the development of LMMs that better engage in natural, socially aware interactions.
Similar Papers
From Punchlines to Predictions: A Metric to Assess LLM Performance in Identifying Humor in Stand-Up Comedy
Computation and Language
AI can now find jokes better than people.
ComicsPAP: understanding comic strips by picking the correct panel
CV and Pattern Recognition
Helps computers understand comic book stories.
Not All Jokes Land: Evaluating Large Language Models Understanding of Workplace Humor
Computation and Language
Helps computers understand workplace jokes.