Wasserstein Distances Made Explainable: Insights into Dataset Shifts and Transport Phenomena
By: Philip Naumann, Jacob Kauffmann, Grégoire Montavon
Potential Business Impact:
Explains why data is different by finding key parts.
Wasserstein distances provide a powerful framework for comparing data distributions. They can be used to analyze processes over time or to detect inhomogeneities within data. However, simply calculating the Wasserstein distance or analyzing the corresponding transport map (or coupling) may not be sufficient for understanding what factors contribute to a high or low Wasserstein distance. In this work, we propose a novel solution based on Explainable AI that allows us to efficiently and accurately attribute Wasserstein distances to various data components, including data subgroups, input features, or interpretable subspaces. Our method achieves high accuracy across diverse datasets and Wasserstein distance specifications, and its practical utility is demonstrated in two use cases.
Similar Papers
On the Information Processing of One-Dimensional Wasserstein Distances with Finite Samples
Machine Learning (CS)
Finds important differences in data patterns.
Measures of Dependence based on Wasserstein distances
Statistics Theory
Measures how things are connected, even in weird ways.
Wasserstein-based Kernels for Clustering: Application to Power Distribution Graphs
Machine Learning (CS)
Groups similar complex things, even without numbers.