Out-of-Sample Embedding with Proximity Data: Projection versus Restricted Reconstruction
By: Michael W. Trosset , Kaiyi Tan , Minh Tang and more
Potential Business Impact:
Helps computers add new data to existing charts.
The problem of using proximity (similarity or dissimilarity) data for the purpose of "adding a point to a vector diagram" was first studied by J.C. Gower in 1968. Since then, a number of methods -- mostly kernel methods -- have been proposed for solving what has come to be called the problem of *out-of-sample embedding*. We survey the various kernel methods that we have encountered and show that each can be derived from one or the other of two competing strategies: *projection* or *restricted reconstruction*. Projection can be analogized to a well-known formula for adding a point to a principal component analysis. Restricted reconstruction poses a different challenge: how to best approximate redoing the entire multivariate analysis while holding fixed the vector diagram that was previously obtained. This strategy results in a nonlinear optimization problem that can be simplified to a unidimensional search. Various circumstances may warrant either projection or restricted reconstruction.
Similar Papers
A Scalable Approach to Clustering Embedding Projections
Human-Computer Interaction
Finds patterns in data much faster.
Joint Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction for Self Supervised Learning
Machine Learning (CS)
Makes computers learn better from messy data.
Concentration bounds on response-based vector embeddings of black-box generative models
Machine Learning (Stat)
Helps understand how AI makes answers.