The Tree-SNE Tree Exists
By: Jack Kendrick
Potential Business Impact:
Shows data patterns at different zoom levels.
The clustering and visualisation of high-dimensional data is a ubiquitous task in modern data science. Popular techniques include nonlinear dimensionality reduction methods like t-SNE or UMAP. These methods face the `scale-problem' of clustering: when dealing with the MNIST dataset, do we want to distinguish different digits or do we want to distinguish different ways of writing the digits? The answer is task dependent and depends on scale. We revisit an idea of Robinson & Pierce-Hoffman that exploits an underlying scaling symmetry in t-SNE to replace 2-dimensional with (2+1)-dimensional embeddings where the additional parameter accounts for scale. This gives rise to the t-SNE tree (short: tree-SNE). We prove that the optimal embedding depends continuously on the scaling parameter for all initial conditions outside a set of measure 0: the tree-SNE tree exists. This idea conceivably extends to other attraction-repulsion methods and is illustrated on several examples.
Similar Papers
Cluster and then Embed: A Modular Approach for Visualization
Machine Learning (CS)
Shows data groups clearly, without messing up the big picture.
EmbedOR: Provable Cluster-Preserving Visualizations with Curvature-Based Stochastic Neighbor Embeddings
Machine Learning (CS)
Shows hidden groups in complex data.
Decision Tree Embedding by Leaf-Means
Machine Learning (Stat)
Makes smart computer models learn faster and better.