Score: 1

Out-of-Sample Embedding with Proximity Data: Projection versus Restricted Reconstruction

Published: May 10, 2025 | arXiv ID: 2505.06756v1

By: Michael W. Trosset , Kaiyi Tan , Minh Tang and more

BigTech Affiliations: Johns Hopkins University

Potential Business Impact:

Helps computers add new data to existing charts.

Business Areas:

Image Recognition Data and Analytics, Software

The problem of using proximity (similarity or dissimilarity) data for the purpose of "adding a point to a vector diagram" was first studied by J.C. Gower in 1968. Since then, a number of methods -- mostly kernel methods -- have been proposed for solving what has come to be called the problem of *out-of-sample embedding*. We survey the various kernel methods that we have encountered and show that each can be derived from one or the other of two competing strategies: *projection* or *restricted reconstruction*. Projection can be analogized to a well-known formula for adding a point to a principal component analysis. Restricted reconstruction poses a different challenge: how to best approximate redoing the entire multivariate analysis while holding fixed the vector diagram that was previously obtained. This strategy results in a nonlinear optimization problem that can be simplified to a unidimensional search. Various circumstances may warrant either projection or restricted reconstruction.

A Scalable Approach to Clustering Embedding Projections

Human-Computer Interaction

Finds patterns in data much faster.

9 Apr 2025 2

86%

Joint Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction for Self Supervised Learning

Machine Learning (CS)

Makes computers learn better from messy data.

18 May 2025 0

86%

Concentration bounds on response-based vector embeddings of black-box generative models

Machine Learning (Stat)

Helps understand how AI makes answers.

11 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

19 pages

Out-of-Sample Embedding with Proximity Data: Projection versus Restricted Reconstruction

Helps computers add new data to existing charts.

Technical Abstract

A Scalable Approach to Clustering Embedding Projections

Joint Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction for Self Supervised Learning

Concentration bounds on response-based vector embeddings of black-box generative models