Score: 1

Out-of-Sample Embedding with Proximity Data: Projection versus Restricted Reconstruction

Published: May 10, 2025 | arXiv ID: 2505.06756v1

By: Michael W. Trosset , Kaiyi Tan , Minh Tang and more

BigTech Affiliations: Johns Hopkins University

Potential Business Impact:

Helps computers add new data to existing charts.

Business Areas:
Image Recognition Data and Analytics, Software

The problem of using proximity (similarity or dissimilarity) data for the purpose of "adding a point to a vector diagram" was first studied by J.C. Gower in 1968. Since then, a number of methods -- mostly kernel methods -- have been proposed for solving what has come to be called the problem of *out-of-sample embedding*. We survey the various kernel methods that we have encountered and show that each can be derived from one or the other of two competing strategies: *projection* or *restricted reconstruction*. Projection can be analogized to a well-known formula for adding a point to a principal component analysis. Restricted reconstruction poses a different challenge: how to best approximate redoing the entire multivariate analysis while holding fixed the vector diagram that was previously obtained. This strategy results in a nonlinear optimization problem that can be simplified to a unidimensional search. Various circumstances may warrant either projection or restricted reconstruction.

Country of Origin
🇺🇸 United States

Page Count
19 pages

Category
Statistics:
Machine Learning (Stat)