Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models
By: Mehran Tamjidi , Hamidreza Dastmalchi , Mohammadreza Alimoradijazi and more
Potential Business Impact:
Fixes 3D object recognition in messy data.
3D Vision-Language Foundation Models (VLFMs) have shown strong generalization and zero-shot recognition capabilities in open-world point cloud processing tasks. However, these models often underperform in practical scenarios where data are noisy, incomplete, or drawn from a different distribution than the training data. To address this, we propose Uni-Adapter, a novel training-free online test-time adaptation (TTA) strategy for 3D VLFMs based on dynamic prototype learning. We define a 3D cache to store class-specific cluster centers as prototypes, which are continuously updated to capture intra-class variability in heterogeneous data distributions. These dynamic prototypes serve as anchors for cache-based logit computation via similarity scoring. Simultaneously, a graph-based label smoothing module captures inter-prototype similarities to enforce label consistency among similar prototypes. Finally, we unify predictions from the original 3D VLFM and the refined 3D cache using entropy-weighted aggregation for reliable adaptation. Without retraining, Uni-Adapter effectively mitigates distribution shifts, achieving state-of-the-art performance on diverse 3D benchmarks over different 3D VLFMs, improving ModelNet-40C by 10.55%, ScanObjectNN-C by 8.26%, and ShapeNet-C by 4.49% over the source 3D VLFMs.
Similar Papers
Ultra-Light Test-Time Adaptation for Vision--Language Models
CV and Pattern Recognition
Makes AI better at seeing new things.
Adaptive Cache Enhancement for Test-Time Adaptation of Vision-Language Models
CV and Pattern Recognition
Helps AI see better when things look different.
TOFA: Training-Free One-Shot Federated Adaptation for Vision-Language Models
Artificial Intelligence
Teaches AI to learn from many computers at once.