Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach
By: Pierre Adorni , Minh-Tan Pham , Stéphane May and more
Potential Business Impact:
Predicts how well computer vision models work.
Foundation models constitute a significant advancement in computer vision: after a single, albeit costly, training phase, they can address a wide array of tasks. In the field of Earth observation, over 75 remote sensing vision foundation models have been developed in the past four years. However, none has consistently outperformed the others across all available downstream tasks. To facilitate their comparison, we propose a cost-effective method for predicting a model's performance on multiple downstream tasks without the need for fine-tuning on each one. This method is based on what we call "capabilities encoding." The utility of this novel approach is twofold: we demonstrate its potential to simplify the selection of a foundation model for a given new task, and we employ it to offer a fresh perspective on the existing literature, suggesting avenues for future research. Codes are available at https://github.com/pierreadorni/capabilities-encoding.
Similar Papers
A Genealogy of Multi-Sensor Foundation Models in Remote Sensing
CV and Pattern Recognition
Helps computers understand Earth from space better.
Towards a Unified Copernicus Foundation Model for Earth Vision
CV and Pattern Recognition
Lets satellites understand Earth better, from land to air.
Do Satellite Tasks Need Special Pretraining?
CV and Pattern Recognition
Satellite images analyzed better by regular computer programs.