Foundry: Distilling 3D Foundation Models for the Edge
By: Guillaume Letellier , Siddharth Srivastava , Frédéric Jurie and more
Potential Business Impact:
Shrinks big AI models for small devices.
Foundation models pre-trained with self-supervised learning (SSL) on large-scale datasets have become powerful general-purpose feature extractors. However, their immense size and computational cost make them prohibitive for deployment on edge devices such as robots and AR/VR headsets. Existing compression techniques like standard knowledge distillation create efficient 'specialist' models but sacrifice the crucial, downstream-agnostic generality that makes foundation models so valuable. In this paper, we introduce Foundation Model Distillation (FMD), a new paradigm for compressing large SSL models into compact, efficient, and faithful proxies that retain their general-purpose representational power. We present Foundry, the first implementation of FMD for 3D point clouds. Our approach, Foundry, trains a student to learn a compressed set of SuperTokens that reconstruct the teacher's token-level representations, capturing a compact basis of its latent space. A single distilled model maintains strong transferability across diverse downstream tasks-classification, part segmentation, and few-shot scenarios-approaching full foundation-model performance while using significantly fewer tokens and FLOPs, making such models more practical for deployment on resourceconstrained hardware.
Similar Papers
Foundation Model for Polycrystalline Material Informatics
Computational Engineering, Finance, and Science
Teaches computers to predict how materials will behave.
Vision Foundation Models Can Be Good Tokenizers for Latent Diffusion Models
CV and Pattern Recognition
Makes AI art look better and faster.
Agglomerating Large Vision Encoders via Distillation for VFSS Segmentation
CV and Pattern Recognition
Teaches small AI to see like big AI.