Score: 0

Optional subsampling for generalized estimating equations in growing-dimensional longitudinal Data

Published: August 28, 2025 | arXiv ID: 2508.20803v1

By: Chunjing Li, Jiahui Zhang, Xiaohui Yuan

Potential Business Impact:

Helps analyze big health data faster.

Business Areas:
A/B Testing Data and Analytics

As a powerful tool for longitudinal data analysis, the generalized estimating equations have been widely studied in the academic community. However, in large-scale settings, this approach faces pronounced computational and storage challenges. In this paper, we propose an optimal Poisson subsampling algorithm for generalized estimating equations in large-scale longitudinal data with diverging covariate dimension, and establish the asymptotic properties of the resulting estimator. We further derive the optimal Poisson subsampling probability based on A- and L-optimality criteria. An approximate optimal Poisson subsampling algorithm is proposed, which adopts a two-step procedure to construct these probabilities. Simulation studies are conducted to evaluate the performance of the proposed method under three different working correlation matrices. The results show that the method remains effective even when the working correlation matrices are misspecified. Finally, we apply the proposed method to the CHFS dataset to illustrate its empirical performance.

Country of Origin
🇨🇳 China

Page Count
34 pages

Category
Statistics:
Computation