Information-Theoretic Quality Metric of Low-Dimensional Embeddings
By: Sebastián Gutiérrez-Bernal, Hector Medel Cobaxin, Abiel Galindo González
In this work we study the quality of low-dimensional embeddings from an explicitly information-theoretic perspective. We begin by noting that classical evaluation metrics such as stress, rank-based neighborhood criteria, or Local Procrustes quantify distortions in distances or in local geometries, but do not directly assess how much information is preserved when projecting high-dimensional data onto a lower-dimensional space. To address this limitation, we introduce the Entropy Rank Preservation Measure (ERPM), a local metric based on the Shannon entropy of the singular-value spectrum of neighborhood matrices and on the stable rank, which quantifies changes in uncertainty between the original representation and its reduced projection, providing neighborhood-level indicators and a global summary statistic. To validate the results of the metric, we compare its outcomes with the Mean Relative Rank Error (MRRE), which is distance-based, and with Local Procrustes, which is based on geometric properties, using a financial time series and a manifold commonly studied in the literature. We observe that distance-based criteria exhibit very low correlation with geometric and spectral measures, while ERPM and Local Procrustes show strong average correlation but display significant discrepancies in local regimes, leading to the conclusion that ERPM complements existing metrics by identifying neighborhoods with severe information loss, thereby enabling a more comprehensive assessment of embeddings, particularly in information-sensitive applications such as the construction of early-warning indicators.
Similar Papers
Topological Metric for Unsupervised Embedding Quality Evaluation
Machine Learning (CS)
Measures how well computer "brains" learn without teachers.
Likelihood-Preserving Embeddings for Statistical Inference
Machine Learning (Stat)
Keeps math results the same after data shrinking.
Learning a distance measure from the information-estimation geometry of data
Image and Video Processing
Measures how different pictures look to people.