Score: 0

Statistical methods for clustered competing risk data when the event types are only available in a training dataset

Published: May 4, 2025 | arXiv ID: 2505.02217v1

By: Yujie Wu, Molin Wang

Potential Business Impact:

Helps doctors predict health problems from past data.

Business Areas:

A/B Testing Data and Analytics

We develop methods to analyze clustered competing risks data when the event types are only available in a training dataset and are missing in the main study. We propose to estimate the exposure effects through the cause-specific proportional hazards frailty model where random effects are introduced into the model to account for the within-cluster correlation. We propose a weighted penalized partial likelihood method where the weights represent the probabilities of the occurrence of events, and the weights can be obtained by fitting a classification model for the event types on the training dataset. Alternatively, we propose an imputation approach where the missing event types are imputed based on the predictions from the classification model. We derive the analytical variances, and evaluate the finite sample properties of our methods in an extensive simulation study. As an illustrative example, we apply our methods to estimate the associations between tinnitus and metabolic, sensory and metabolic+sensory hearing loss in the Conservation of Hearing Study Audiology Assessment Arm.