Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models
By: Jiabo Huang, Chen Chen, Lingjuan Lyu
Potential Business Impact:
Combines old AI to make new, smarter AI.
Vision foundation models (VFMs) are predominantly developed using data-centric methods. These methods require training on vast amounts of data usually with high-quality labels, which poses a bottleneck for most institutions that lack both large-scale data and high-end GPUs. On the other hand, many open-source vision models have been pretrained on domain-specific data, enabling them to distill and represent core knowledge in a form that is transferable across diverse applications. Even though these models are highly valuable assets, they remain largely under-explored in empowering the development of a general-purpose VFM. In this paper, we presents a new model-driven approach for training VFMs through joint knowledge transfer and preservation. Our method unifies multiple pre-trained teacher models in a shared latent space to mitigate the ``imbalanced transfer'' issue caused by their distributional gaps. Besides, we introduce a knowledge preservation strategy to take a general-purpose teacher as a knowledge base for integrating knowledge from the remaining purpose-specific teachers using an adapter module. By unifying and aggregating existing models, we build a powerful VFM to inherit teachers' expertise without needing to train on a large amount of labeled data. Our model not only provides generalizable visual features, but also inherently supports multiple downstream tasks. Extensive experiments demonstrate that our VFM outperforms existing data-centric models across four fundamental vision tasks, including image classification, object detection, semantic and instance segmentation.
Similar Papers
Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models
CV and Pattern Recognition
Combines old models to make new smart vision.
An Investigation of Visual Foundation Models Robustness
CV and Pattern Recognition
Makes computer vision work better in bad conditions.
Task-Specific Knowledge Distillation from the Vision Foundation Model for Enhanced Medical Image Segmentation
CV and Pattern Recognition
Teaches computers to see diseases in X-rays.