An Approach to Variable Clustering: K-means in Transposed Data and its Relationship with Principal Component Analysis
By: Victor Saquicela, Kenneth Palacio-Baus, Mario Chifla
Potential Business Impact:
Finds hidden patterns in data by grouping things.
Principal Component Analysis (PCA) and K-means constitute fundamental techniques in multivariate analysis. Although they are frequently applied independently or sequentially to cluster observations, the relationship between them, especially when K-means is used to cluster variables rather than observations, has been scarcely explored. This study seeks to address this gap by proposing an innovative method that analyzes the relationship between clusters of variables obtained by applying K-means on transposed data and the principal components of PCA. Our approach involves applying PCA to the original data and K-means to the transposed data set, where the original variables are converted into observations. The contribution of each variable cluster to each principal component is then quantified using measures based on variable loadings. This process provides a tool to explore and understand the clustering of variables and how such clusters contribute to the principal dimensions of variation identified by PCA.
Similar Papers
Highly robust factored principal component analysis for matrix-valued outlier accommodation and explainable detection via matrix minimum covariance determinant
Methodology
Finds bad data points in complex pictures.
Principal Component Analysis When n < p: Challenges and Solutions
Methodology
Makes computer analysis better with messy, complex data.
TimeCluster with PCA is Equivalent to Subspace Identification of Linear Dynamical Systems
Machine Learning (CS)
Finds patterns in long, changing data.