Score: 1

Synthetic Survival Data Generation for Heart Failure Prognosis Using Deep Generative Models

Published: September 4, 2025 | arXiv ID: 2509.04245v1

By: Chanon Puttanawarut , Natcha Fongsrisin , Porntep Amornritvanich and more

Potential Business Impact:

Creates fake patient data for heart research.

Business Areas:
Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Background: Heart failure (HF) research is constrained by limited access to large, shareable datasets due to privacy regulations and institutional barriers. Synthetic data generation offers a promising solution to overcome these challenges while preserving patient confidentiality. Methods: We generated synthetic HF datasets from institutional data comprising 12,552 unique patients using five deep learning models: tabular variational autoencoder (TVAE), normalizing flow, ADSGAN, SurvivalGAN, and tabular denoising diffusion probabilistic models (TabDDPM). We comprehensively evaluated synthetic data utility through statistical similarity metrics, survival prediction using machine learning and privacy assessments. Results: SurvivalGAN and TabDDPM demonstrated high fidelity to the original dataset, exhibiting similar variable distributions and survival curves after applying histogram equalization. SurvivalGAN (C-indices: 0.71-0.76) and TVAE (C-indices: 0.73-0.76) achieved the strongest performance in survival prediction evaluation, closely matched real data performance (C-indices: 0.73-0.76). Privacy evaluation confirmed protection against re-identification attacks. Conclusions: Deep learning-based synthetic data generation can produce high-fidelity, privacy-preserving HF datasets suitable for research applications. This publicly available synthetic dataset addresses critical data sharing barriers and provides a valuable resource for advancing HF research and predictive modeling.

Country of Origin
🇹🇭 Thailand

Repos / Data Links

Page Count
25 pages

Category
Computer Science:
Machine Learning (CS)