Score: 1

$\left|\,\circlearrowright\,\boxed{\text{BUS}}\,\right|$: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles

Published: November 3, 2025 | arXiv ID: 2511.01340v1

By: Trishanu Das , Abhilash Nandy , Khush Bajaj and more

Potential Business Impact:

Helps computers solve picture riddles better.

Business Areas:

Visual Search Internet Services

Understanding Rebus Puzzles (Rebus Puzzles use pictures, symbols, and letters to represent words or phrases creatively) requires a variety of skills such as image recognition, cognitive skills, commonsense reasoning, multi-step reasoning, image-based wordplay, etc., making this a challenging task for even current Vision-Language Models. In this paper, we present $\left|\,\circlearrowright\,\boxed{\text{BUS}}\,\right|$, a large and diverse benchmark of $1,333$ English Rebus Puzzles containing different artistic styles and levels of difficulty, spread across 18 categories such as food, idioms, sports, finance, entertainment, etc. We also propose $RebusDescProgICE$, a model-agnostic framework which uses a combination of an unstructured description and code-based, structured reasoning, along with better, reasoning-based in-context example selection, improving the performance of Vision-Language Models on $\left|\,\circlearrowright\,\boxed{\text{BUS}}\,\right|$ by $2.1-4.1\%$ and $20-30\%$ using closed-source and open-source models respectively compared to Chain-of-Thought Reasoning.

Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint

Computation and Language

Computers learn to solve picture riddles.

29 May 2025 0

87%

Reasoning Riddles: How Explainability Reveals Cognitive Limits in Vision-Language Models

CV and Pattern Recognition

Helps computers solve picture puzzles by explaining thinking.

3 Oct 2025 1

87%

Eye-Q: A Multilingual Benchmark for Visual Word Puzzle Solving and Image-to-Phrase Reasoning

CV and Pattern Recognition

Teaches computers to understand pictures like people.

6 Jan 2026 3

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

7 pages

$\left|\,\circlearrowright\,\boxed{\text{BUS}}\,\right|$: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles

Helps computers solve picture riddles better.

Technical Abstract

Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint

Reasoning Riddles: How Explainability Reveals Cognitive Limits in Vision-Language Models

Eye-Q: A Multilingual Benchmark for Visual Word Puzzle Solving and Image-to-Phrase Reasoning