Score: 0

Linearized Optimal Transport for Analysis of High-Dimensional Point-Cloud and Single-Cell Data

Published: October 24, 2025 | arXiv ID: 2510.22033v1

By: Tianxiang Wang , Yingtong Ke , Dhananjay Bhaskar and more

Potential Business Impact:

Compares cell data to understand sickness and treatments.

Business Areas:
Big Data Data and Analytics

Single-cell technologies generate high-dimensional point clouds of cells, enabling detailed characterization of complex patient states and treatment responses. Yet each patient is represented by an irregular point cloud rather than a simple vector, making it difficult to directly quantify and compare biological differences between individuals. Nonlinear methods such as kernels and neural networks achieve predictive accuracy but act as black boxes, offering little biological interpretability. To address these limitations, we adapt the Linear Optimal Transport (LOT) framework to this setting, embedding irregular point clouds into a fixed-dimensional Euclidean space while preserving distributional structure. This embedding provides a principled linear representation that preserves optimal transport geometry while enabling downstream analysis. It also forms a registration between any two patients, enabling direct comparison of their cellular distributions. Within this space, LOT enables: (i) \textbf{accurate and interpretable classification} of COVID-19 patient states, where classifier weights map back to specific markers and spatial regions driving predictions; and (ii) \textbf{synthetic data generation} for patient-derived organoids, exploiting the linearity of the LOT embedding. LOT barycenters yield averaged cellular profiles representing combined conditions or samples, supporting drug interaction testing. Together, these results establish LOT as a unified framework that bridges predictive performance, interpretability, and generative modeling. By transforming heterogeneous point clouds into structured embeddings directly traceable to the original data, LOT opens new opportunities for understanding immune variation and treatment effects in high-dimensional biological systems.

Country of Origin
🇺🇸 United States

Page Count
12 pages

Category
Computer Science:
Machine Learning (CS)