Score: 1

MMAO-Bench: MultiModal All in One Benchmark Reveals Compositional Law between Uni-modal and Omni-modal in OmniModels

Published: October 21, 2025 | arXiv ID: 2510.18915v1

By: Chen Chen , ZeYang Hu , Fengjiao Chen and more

BigTech Affiliations: Meituan

Potential Business Impact:

Tests computers on seeing, hearing, and reading.

Business Areas:
A/B Testing Data and Analytics

Multimodal Large Languages models have been progressing from uni-modal understanding toward unifying visual, audio and language modalities, collectively termed omni models. However, the correlation between uni-modal and omni-modal remains unclear, which requires comprehensive evaluation to drive omni model's intelligence evolution. In this work, we propose a novel, high quality and diversity omni model benchmark, MultiModal All in One Benchmark (MMAO-Bench), which effectively assesses both uni-modal and omni-modal understanding capabilities. The benchmark consists of 1880 human curated samples, across 44 task types, and a innovative multi-step open-ended question type that better assess complex reasoning tasks. Experimental result shows the compositional law between cross-modal and uni-modal performance and the omni-modal capability manifests as a bottleneck effect on weak models, while exhibiting synergistic promotion on strong models.

Country of Origin
🇨🇳 China

Page Count
12 pages

Category
Computer Science:
Computation and Language