Smooth Flow Matching
By: Jianbin Tan, Anru R. Zhang
Potential Business Impact:
Creates fake patient data to study health.
Functional data, i.e., smooth random functions observed over a continuous domain, are increasingly available in areas such as biomedical research, health informatics, and epidemiology. However, effective statistical analysis for functional data is often hindered by challenges such as privacy constraints, sparse and irregular sampling, infinite dimensionality, and non-Gaussian structures. To address these challenges, we introduce a novel framework named Smooth Flow Matching (SFM), tailored for generative modeling of functional data to enable statistical analysis without exposing sensitive real data. Built upon flow-matching ideas, SFM constructs a semiparametric copula flow to generate infinite-dimensional functional data, free from Gaussianity or low-rank assumptions. It is computationally efficient, handles irregular observations, and guarantees the smoothness of the generated functions, offering a practical and flexible solution in scenarios where existing deep generative methods are not applicable. Through extensive simulation studies, we demonstrate the advantages of SFM in terms of both synthetic data quality and computational efficiency. We then apply SFM to generate clinical trajectory data from the MIMIC-IV patient electronic health records (EHR) longitudinal database. Our analysis showcases the ability of SFM to produce high-quality surrogate data for downstream statistical tasks, highlighting its potential to boost the utility of EHR data for clinical applications.
Similar Papers
Functional Mean Flow in Hilbert Space
Machine Learning (CS)
Creates new data like pictures or sounds quickly.
Multi-Marginal Stochastic Flow Matching for High-Dimensional Snapshot Data at Irregular Time Points
Machine Learning (CS)
Tracks complex changes from few, uneven snapshots.
Longitudinal Flow Matching for Trajectory Modeling
Machine Learning (CS)
Predicts future body changes from brain scans.