Compressibility Barriers to Neighborhood-Preserving Data Visualizations
By: Szymon Snoeck, Noah Bergam, Nakul Verma
Potential Business Impact:
Shows how to draw complex data in simple pictures.
To what extent is it possible to visualize high-dimensional datasets in a two- or three-dimensional space? We reframe this question in terms of embedding $n$-vertex graphs (representing the neighborhood structure of the input points) into metric spaces of low doubling dimension $d$, in such a way that maintains the separation between neighbors and non-neighbors. This seemingly lax embedding requirement is surprisingly difficult to satisfy. Our investigation shows that an overwhelming fraction of graphs require $d = \Omega(\log n)$. Even when considering sparse regular graphs, the situation does not improve, as an overwhelming fraction of such graphs requires $d= \Omega(\log n / \log\log n)$. The landscape changes dramatically when embedding into normed spaces. In particular, all but a vanishing fraction of graphs demand $d=\Theta(n)$. Finally, we study the implications of these results for visualizing data with intrinsic cluster structure. We find that graphs produced from a planted partition model with $k$ clusters on $n$ points typically require $d=\Omega(\log n)$, even when the cluster structure is salient. These results challenge the aspiration that constant-dimensional visualizations can faithfully preserve neighborhood structure.
Similar Papers
Embedding Graphs as Euclidean kNN-Graphs
Computational Geometry
Maps connections to nearest neighbors in space.
Tree Embedding in High Dimensions: Dynamic and Massively Parallel
Data Structures and Algorithms
Makes computer maps of data faster and more accurate.
How Low Can You Go? Searching for the Intrinsic Dimensionality of Complex Networks using Metric Node Embeddings
Machine Learning (CS)
Makes computer networks smaller and easier to understand.