Score: 2

A Critical Analysis of the Usage of Dimensionality Reduction in Four Domains

Published: March 11, 2025 | arXiv ID: 2503.08836v2

By: Dylan Cashman , Mark Keller , Hyeon Jeon and more

BigTech Affiliations: IBM

Potential Business Impact:

Helps scientists understand complex data better.

Business Areas:
Data Mining Data and Analytics, Information Technology

Dimensionality reduction is used as an important tool for unraveling the complexities of high-dimensional datasets in many fields of science, such as cell biology, chemical informatics, and physics. Visualizations of the dimensionally reduced data enable scientists to delve into the intrinsic structures of their datasets and align them with established hypotheses. Visualization researchers have thus proposed many dimensionality reduction methods and interactive systems designed to uncover latent structures. At the same time, different scientific domains have formulated guidelines or common workflows for using dimensionality reduction techniques and visualizations for their respective fields. In this work, we present a critical analysis of the usage of dimensionality reduction in scientific domains outside of computer science. First, we conduct a bibliometric analysis of 21,249 academic publications that use dimensionality reduction to observe differences in the frequency of techniques across fields. Next, we conduct a survey of a 71-paper sample from four fields: biology, chemistry, physics, and business. Through this survey, we uncover common workflows, processes, and usage patterns, including the mixed use of confirmatory data analysis to validate a dataset and projection method and exploratory data analysis to then generate more hypotheses. We also find that misinterpretations and inappropriate usage is common, particularly in the visual interpretation of the resulting dimensionally reduced view. Lastly, we compare our observations with recent works in the visualization community in order to match work within our community to potential areas of impact outside our community.

Country of Origin
πŸ‡ΊπŸ‡Έ πŸ‡°πŸ‡· United States, Korea, Republic of

Page Count
20 pages

Category
Computer Science:
Human-Computer Interaction