Score: 1

Multi-Track Multimodal Learning on iMiGUE: Micro-Gesture and Emotion Recognition

Published: December 29, 2025 | arXiv ID: 2512.23291v1

By: Arman Martirosyan , Shahane Tigranyan , Maria Razzhivina and more

Potential Business Impact:

Lets computers understand your feelings and tiny movements.

Business Areas:

Image Recognition Data and Analytics, Software

Micro-gesture recognition and behavior-based emotion prediction are both highly challenging tasks that require modeling subtle, fine-grained human behaviors, primarily leveraging video and skeletal pose data. In this work, we present two multimodal frameworks designed to tackle both problems on the iMiGUE dataset. For micro-gesture classification, we explore the complementary strengths of RGB and 3D pose-based representations to capture nuanced spatio-temporal patterns. To comprehensively represent gestures, video, and skeletal embeddings are extracted using MViTv2-S and 2s-AGCN, respectively. Then, they are integrated through a Cross-Modal Token Fusion module to combine spatial and pose information. For emotion recognition, our framework extends to behavior-based emotion prediction, a binary classification task identifying emotional states based on visual cues. We leverage facial and contextual embeddings extracted using SwinFace and MViTv2-S models and fuse them through an InterFusion module designed to capture emotional expressions and body gestures. Experiments conducted on the iMiGUE dataset, within the scope of the MiGA 2025 Challenge, demonstrate the robust performance and accuracy of our method in the behavior-based emotion prediction task, where our approach secured 2nd place.

Towards Fine-Grained Emotion Understanding via Skeleton-Based Micro-Gesture Recognition

CV and Pattern Recognition

Reads tiny hand movements to guess hidden feelings.

15 Jun 2025 1

91%

MM-Gesture: Towards Precise Micro-Gesture Recognition through Multimodal Fusion

CV and Pattern Recognition

Recognizes tiny hand movements from many video types.

11 Jul 2025 3

89%

Hybrid-supervised Hypergraph-enhanced Transformer for Micro-gesture Based Emotion Recognition

CV and Pattern Recognition

Reads your hidden feelings from tiny movements.

20 Jul 2025 1

View PDF Login to Bookmark

Country of Origin

🇦🇲 🇷🇺 Armenia, Russian Federation

Page Count

8 pages

Multi-Track Multimodal Learning on iMiGUE: Micro-Gesture and Emotion Recognition

Lets computers understand your feelings and tiny movements.

Technical Abstract

Towards Fine-Grained Emotion Understanding via Skeleton-Based Micro-Gesture Recognition

MM-Gesture: Towards Precise Micro-Gesture Recognition through Multimodal Fusion

Hybrid-supervised Hypergraph-enhanced Transformer for Micro-gesture Based Emotion Recognition