Extracting Uncertainty Estimates from Mixtures of Experts for Semantic Segmentation
By: Svetlana Pavlitska , Beyza Keskin , Alwin Faßbender and more
Potential Business Impact:
Helps self-driving cars know when they are unsure.
Estimating accurate and well-calibrated predictive uncertainty is important for enhancing the reliability of computer vision models, especially in safety-critical applications like traffic scene perception. While ensemble methods are commonly used to quantify uncertainty by combining multiple models, a mixture of experts (MoE) offers an efficient alternative by leveraging a gating network to dynamically weight expert predictions based on the input. Building on the promising use of MoEs for semantic segmentation in our previous works, we show that well-calibrated predictive uncertainty estimates can be extracted from MoEs without architectural modifications. We investigate three methods to extract predictive uncertainty estimates: predictive entropy, mutual information, and expert variance. We evaluate these methods for an MoE with two experts trained on a semantical split of the A2D2 dataset. Our results show that MoEs yield more reliable uncertainty estimates than ensembles in terms of conditional correctness metrics under out-of-distribution (OOD) data. Additionally, we evaluate routing uncertainty computed via gate entropy and find that simple gating mechanisms lead to better calibration of routing uncertainty estimates than more complex classwise gates. Finally, our experiments on the Cityscapes dataset suggest that increasing the number of experts can further enhance uncertainty calibration. Our code is available at https://github.com/KASTEL-MobilityLab/mixtures-of-experts/.
Similar Papers
Bayesian Mixture of Experts For Large Language Models
Machine Learning (CS)
Helps AI know when it's unsure about answers.
A Mixture of Experts Gating Network for Enhanced Surrogate Modeling in External Aerodynamics
Machine Learning (CS)
Makes car designs faster by predicting air flow.
Mixture of Many Zero-Compute Experts: A High-Rate Quantization Theory Perspective
Machine Learning (CS)
Makes computer models learn faster with more experts.