Score: 1

EmoCaliber: Advancing Reliable Visual Emotion Comprehension via Confidence Verbalization and Calibration

Published: December 17, 2025 | arXiv ID: 2512.15528v1

By: Daiqing Wu, Dongbao Yang, Can Ma. Yu Zhou

Potential Business Impact:

Helps computers show how sure they are about emotions.

Business Areas:

Image Recognition Data and Analytics, Software

Visual Emotion Comprehension (VEC) aims to infer sentiment polarities or emotion categories from affective cues embedded in images. In recent years, Multimodal Large Language Models (MLLMs) have established a popular paradigm in VEC, leveraging their generalizability to unify VEC tasks defined under diverse emotion taxonomies. While this paradigm achieves notable success, it typically formulates VEC as a deterministic task, requiring the model to output a single, definitive emotion label for each image. Such a formulation insufficiently accounts for the inherent subjectivity of emotion perception, overlooking alternative interpretations that may be equally plausible to different viewers. To address this limitation, we propose equipping MLLMs with capabilities to verbalize their confidence in emotion predictions. This additional signal provides users with an estimate of both the plausibility of alternative interpretations and the MLLMs' self-assessed competence, thereby enhancing reliability in practice. Building on this insight, we introduce a three-stage training framework that progressively endows with structured reasoning, teaches to verbalize confidence, and calibrates confidence expression, culminating in EmoCaliber, a confidence-aware MLLM for VEC. Through fair and comprehensive evaluations on the unified benchmark VECBench, EmoCaliber demonstrates overall superiority against existing methods in both emotion prediction and confidence estimation. These results validate the effectiveness of our approach and mark a feasible step toward more reliable VEC systems. Project page: https://github.com/wdqqdw/EmoCaliber.

EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis

CV and Pattern Recognition

Shows how pictures make people feel.

16 Nov 2025 0

88%

EEmo-Bench: A Benchmark for Multi-modal Large Language Models on Image Evoked Emotion Assessment

Multimedia

Helps computers understand feelings in pictures.

23 Apr 2025 2

88%

TiCAL:Typicality-Based Consistency-Aware Learning for Multimodal Emotion Recognition

CV and Pattern Recognition

Helps computers understand feelings better, even when they disagree.

19 Nov 2025 2

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

18 pages

EmoCaliber: Advancing Reliable Visual Emotion Comprehension via Confidence Verbalization and Calibration

Helps computers show how sure they are about emotions.

Technical Abstract

EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis

EEmo-Bench: A Benchmark for Multi-modal Large Language Models on Image Evoked Emotion Assessment

TiCAL:Typicality-Based Consistency-Aware Learning for Multimodal Emotion Recognition