Score: 0

Distributed inference for heterogeneous mixture models using multi-site data

Published: December 18, 2025 | arXiv ID: 2512.16833v1

By: Xiaokang Liu , Rui Duan , Raymond J. Carroll and more

Mixture models postulate the overall population as a mixture of finite subpopulations with unobserved membership. Fitting mixture models usually requires large sample sizes and combining data from multiple sites can be beneficial. However, sharing individual participant data across sites is often less feasible due to various types of practical constraints, such as data privacy concerns. Moreover, substantial heterogeneity may exist across sites, and locally identified latent classes may not be comparable across sites. We propose a unified modeling framework where a common definition of the latent classes is shared across sites and heterogeneous mixing proportions of latent classes are allowed to account for between-site heterogeneity. To fit the heterogeneous mixture model on multi-site data, we propose a novel distributed Expectation-Maximization (EM) algorithm where at each iteration a density ratio tilted surrogate Q function is constructed to approximate the standard Q function of the EM algorithm as if the data from multiple sites could be pooled together. Theoretical analysis shows that our estimator achieves the same contraction property as the estimators derived from the EM algorithm based on the pooled data.

A Bayesian approach to learning mixtures of nonparametric components

Methodology

Finds hidden groups in messy data.

15 Dec 2025 0

87%

Distributional Treatment Effect Estimation across Heterogeneous Sites via Optimal Transport

Methodology

Creates realistic fake patient data for drug testing.

12 Nov 2025 0

87%

Improving prediction in M-estimation by integrating external information from heterogeneous populations

Methodology

Helps make predictions better using outside information.

4 Sep 2025 0

View PDF Login to Bookmark

Distributed inference for heterogeneous mixture models using multi-site data

Technical Abstract

A Bayesian approach to learning mixtures of nonparametric components

Distributional Treatment Effect Estimation across Heterogeneous Sites via Optimal Transport

Improving prediction in M-estimation by integrating external information from heterogeneous populations