An Investigation of Visual Foundation Models Robustness
By: Sandeep Gupta, Roberto Passerone
Potential Business Impact:
Makes computer vision work better in bad conditions.
Visual Foundation Models (VFMs) are becoming ubiquitous in computer vision, powering systems for diverse tasks such as object detection, image classification, segmentation, pose estimation, and motion tracking. VFMs are capitalizing on seminal innovations in deep learning models, such as LeNet-5, AlexNet, ResNet, VGGNet, InceptionNet, DenseNet, YOLO, and ViT, to deliver superior performance across a range of critical computer vision applications. These include security-sensitive domains like biometric verification, autonomous vehicle perception, and medical image analysis, where robustness is essential to fostering trust between technology and the end-users. This article investigates network robustness requirements crucial in computer vision systems to adapt effectively to dynamic environments influenced by factors such as lighting, weather conditions, and sensor characteristics. We examine the prevalent empirical defenses and robust training employed to enhance vision network robustness against real-world challenges such as distributional shifts, noisy and spatially distorted inputs, and adversarial attacks. Subsequently, we provide a comprehensive analysis of the challenges associated with these defense mechanisms, including network properties and components to guide ablation studies and benchmarking metrics to evaluate network robustness.
Similar Papers
Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models
CV and Pattern Recognition
Combines old models to make new smart vision.
Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models
CV and Pattern Recognition
Combines old AI to make new, smarter AI.
ActiveMark: on watermarking of visual foundation models via massive activations
CV and Pattern Recognition
Protects AI art from being stolen.