MCA: 2D-3D Retrieval with Noisy Labels via Multi-level Adaptive Correction and Alignment
By: Gui Zou , Chaofan Gan , Chern Hong Lim and more
Potential Business Impact:
Finds matching 3D objects from 2D pictures.
With the increasing availability of 2D and 3D data, significant advancements have been made in the field of cross-modal retrieval. Nevertheless, the existence of imperfect annotations presents considerable challenges, demanding robust solutions for 2D-3D cross-modal retrieval in the presence of noisy label conditions. Existing methods generally address the issue of noise by dividing samples independently within each modality, making them susceptible to overfitting on corrupted labels. To address these issues, we propose a robust 2D-3D \textbf{M}ulti-level cross-modal adaptive \textbf{C}orrection and \textbf{A}lignment framework (MCA). Specifically, we introduce a Multimodal Joint label Correction (MJC) mechanism that leverages multimodal historical self-predictions to jointly model the modality prediction consistency, enabling reliable label refinement. Additionally, we propose a Multi-level Adaptive Alignment (MAA) strategy to effectively enhance cross-modal feature semantics and discrimination across different levels. Extensive experiments demonstrate the superiority of our method, MCA, which achieves state-of-the-art performance on both conventional and realistic noisy 3D benchmarks, highlighting its generality and effectiveness.
Similar Papers
3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding
CV and Pattern Recognition
Helps computers understand 3D scenes from many pictures.
MCL-AD: Multimodal Collaboration Learning for Zero-Shot 3D Anomaly Detection
CV and Pattern Recognition
Finds hidden flaws in 3D objects using different clues.
MGCA-Net: Multi-Graph Contextual Attention Network for Two-View Correspondence Learning
CV and Pattern Recognition
Helps computers see matching objects in different pictures.