An Interactive Tool for Analyzing High-Dimensional Clusterings
By: Justin Lin, Julia Fukuyama
Potential Business Impact:
Helps see hidden patterns in complex data.
Technological advances have spurred an increase in data complexity and dimensionality. We are now in an era in which data sets containing thousands of features are commonplace. To digest and analyze such high-dimensional data, dimension reduction techniques have been developed and advanced along with computational power. Of these techniques, nonlinear methods are most commonly employed because of their ability to construct visually interpretable embeddings. Unlike linear methods, these methods non-uniformly stretch and shrink space to create a visual impression of the high-dimensional data. Since capturing high-dimensional structures in a significantly lower number of dimensions requires drastic manipulation of space, nonlinear dimension reduction methods are known to occasionally produce false structures, especially in noisy settings. In an effort to deal with this phenomenon, we developed an interactive tool that enables analysts to better understand and diagnose their dimension reduction results. It uses various analytical plots to provide a multi-faceted perspective on results to determine legitimacy. The tool is available via an R package named DRtool.
Similar Papers
DRtool: An Interactive Tool for Analyzing High-Dimensional Clusterings
Applications
Helps see hidden patterns in complex data.
A general framework for adaptive nonparametric dimensionality reduction
Machine Learning (Stat)
Finds best way to show complex data simply.
Navigating High-Dimensional Backstage: A Guide for Exploring Literature for the Reliable Use of Dimensionality Reduction
Human-Computer Interaction
Helps people understand data pictures better.