Principal Subsimplex Analysis
By: Hyeon Lee , Kassel Liam Hingee , Janice L. Scealy and more
Potential Business Impact:
Finds patterns in data with zeros.
Compositional data, also referred to as simplicial data, naturally arise in many scientific domains such as geochemistry, microbiology, and economics. In such domains, obtaining sensible lower-dimensional representations and modes of variation plays an important role. A typical approach to the problem is applying a log-ratio transformation followed by principal component analysis (PCA). However, this approach has several well-known weaknesses: it amplifies variation in minor variables; it can obscure important variation within major elements; it is not directly applicable to data sets containing zeros and zero imputation methods give highly variable results; it has limited ability to capture linear patterns present in compositional data. In this paper, we propose novel methods that produce nested sequences of simplices of decreasing dimensions analogous to backwards principal component analysis. These nested sequences offer both interpretable lower dimensional representations and linear modes of variation. In addition, our methods are applicable to data sets contain zeros without any modification. We demonstrate our methods on simulated data and on relative abundances of diatom species during the late Pliocene. Supplementary materials and R implementations for this article are available online.
Similar Papers
Principal Component Analysis When n < p: Challenges and Solutions
Methodology
Makes computer analysis better with messy, complex data.
Riemannian Principal Component Analysis
Machine Learning (Stat)
Analyzes complex shapes by understanding their curves.
Interpretable dimension reduction for compositional data
Methodology
Shows hidden patterns in tiny body bugs.