Score: 2

STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification

Published: March 8, 2025 | arXiv ID: 2503.06277v3

By: Siyi Du , Xinzhe Luo , Declan P. O'Regan and more

Potential Business Impact:

Teaches computers to learn from pictures and lists.

Business Areas:

Semantic Web Internet Services

Multimodal image-tabular learning is gaining attention, yet it faces challenges due to limited labeled data. While earlier work has applied self-supervised learning (SSL) to unlabeled data, its task-agnostic nature often results in learning suboptimal features for downstream tasks. Semi-supervised learning (SemiSL), which combines labeled and unlabeled data, offers a promising solution. However, existing multimodal SemiSL methods typically focus on unimodal or modality-shared features, ignoring valuable task-relevant modality-specific information, leading to a Modality Information Gap. In this paper, we propose STiL, a novel SemiSL tabular-image framework that addresses this gap by comprehensively exploring task-relevant information. STiL features a new disentangled contrastive consistency module to learn cross-modal invariant representations of shared information while retaining modality-specific information via disentanglement. We also propose a novel consensus-guided pseudo-labeling strategy to generate reliable pseudo-labels based on classifier consensus, along with a new prototype-guided label smoothing technique to refine pseudo-label quality with prototype embeddings, thereby enhancing task-relevant information learning in unlabeled data. Experiments on natural and medical image datasets show that STiL outperforms the state-of-the-art supervised/SSL/SemiSL image/multimodal approaches. Our code is available at https://github.com/siyi-wind/STiL.

Unleashing the Power of Image-Tabular Self-Supervised Learning via Breaking Cross-Tabular Barriers

CV and Pattern Recognition

Helps doctors diagnose diseases better across hospitals.

16 Dec 2025 3

86%

Modality-Specific Enhancement and Complementary Fusion for Semi-Supervised Multi-Modal Brain Tumor Segmentation

CV and Pattern Recognition

Helps doctors find sickness in body scans.

10 Dec 2025 0

86%

Multimodal Tabular Reasoning with Privileged Structured Information

Machine Learning (CS)

Helps computers understand charts and graphs.

4 Jun 2025 1

View PDF Login to Bookmark

Country of Origin

🇬🇧 United Kingdom

Repos / Data Links

github.com

Page Count

16 pages

STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification

Teaches computers to learn from pictures and lists.

Technical Abstract

Unleashing the Power of Image-Tabular Self-Supervised Learning via Breaking Cross-Tabular Barriers

Modality-Specific Enhancement and Complementary Fusion for Semi-Supervised Multi-Modal Brain Tumor Segmentation

Multimodal Tabular Reasoning with Privileged Structured Information