Representation Learning with Adaptive Superpixel Coding
By: Mahmoud Khalil, Ahmad Khalil, Alioune Ngom
Potential Business Impact:
Makes computer vision understand pictures better.
Deep learning vision models are typically tailored for specific modalities and often rely on domain-specific assumptions, such as the grid structures used by nearly all existing vision models. In this work, we propose a self-supervised model based on Transformers, which we call Adaptive Superpixel Coding (ASC). The key insight of our model is to overcome the limitations of traditional Vision Transformers, which depend on fixed-size and non-adaptive patch partitioning. Instead, ASC employs adaptive superpixel layers that dynamically adjust to the underlying image content. We analyze key properties of the approach that make it effective, and find that our method outperforms widely-used alternatives on standard image downstream task benchmarks.
Similar Papers
In Pursuit of Pixel Supervision for Visual Pre-training
CV and Pattern Recognition
Teaches computers to understand images without labels.
Selective Masking based Self-Supervised Learning for Image Semantic Segmentation
CV and Pattern Recognition
Teaches computers to see better by guessing missing parts.
Deep Attention-guided Adaptive Subsampling
CV and Pattern Recognition
Makes smart computers work faster and cheaper.