Score: 1

Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition

Published: May 14, 2025 | arXiv ID: 2505.09336v1

By: Muzammil Behzad

Potential Business Impact:

Helps computers understand feelings from faces.

Business Areas:

Image Recognition Data and Analytics, Software

In this paper, we introduce MultiviewVLM, a vision-language model designed for unsupervised contrastive multiview representation learning of facial emotions from 3D/4D data. Our architecture integrates pseudo-labels derived from generated textual prompts to guide implicit alignment of emotional semantics. To capture shared information across multi-views, we propose a joint embedding space that aligns multiview representations without requiring explicit supervision. We further enhance the discriminability of our model through a novel multiview contrastive learning strategy that leverages stable positive-negative pair sampling. A gradient-friendly loss function is introduced to promote smoother and more stable convergence, and the model is optimized for distributed training to ensure scalability. Extensive experiments demonstrate that MultiviewVLM outperforms existing state-of-the-art methods and can be easily adapted to various real-world applications with minimal modifications.

Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model

CV and Pattern Recognition

Reads emotions from faces in 3D.

28 Apr 2025 1

93%

Self-Supervised Multi-View Representation Learning using Vision-Language Model for 3D/4D Facial Expression Recognition

CV and Pattern Recognition

Computer understands your face's feelings better.

1 Jun 2025 1

91%

Prompt the Unseen: Evaluating Visual-Language Alignment Beyond Supervision

CV and Pattern Recognition

Helps computers understand new pictures they haven't seen.

31 Aug 2025 1

View PDF Login to Bookmark

Page Count

10 pages

Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition

Helps computers understand feelings from faces.

Technical Abstract

Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model

Self-Supervised Multi-View Representation Learning using Vision-Language Model for 3D/4D Facial Expression Recognition

Prompt the Unseen: Evaluating Visual-Language Alignment Beyond Supervision