Score: 0

Compressibility Barriers to Neighborhood-Preserving Data Visualizations

Published: August 9, 2025 | arXiv ID: 2508.07119v1

By: Szymon Snoeck, Noah Bergam, Nakul Verma

Potential Business Impact:

Shows how to draw complex data in simple pictures.

To what extent is it possible to visualize high-dimensional datasets in a two- or three-dimensional space? We reframe this question in terms of embedding $n$-vertex graphs (representing the neighborhood structure of the input points) into metric spaces of low doubling dimension $d$, in such a way that maintains the separation between neighbors and non-neighbors. This seemingly lax embedding requirement is surprisingly difficult to satisfy. Our investigation shows that an overwhelming fraction of graphs require $d = \Omega(\log n)$. Even when considering sparse regular graphs, the situation does not improve, as an overwhelming fraction of such graphs requires $d= \Omega(\log n / \log\log n)$. The landscape changes dramatically when embedding into normed spaces. In particular, all but a vanishing fraction of graphs demand $d=\Theta(n)$. Finally, we study the implications of these results for visualizing data with intrinsic cluster structure. We find that graphs produced from a planted partition model with $k$ clusters on $n$ points typically require $d=\Omega(\log n)$, even when the cluster structure is salient. These results challenge the aspiration that constant-dimensional visualizations can faithfully preserve neighborhood structure.