Score: 0

Exhausting the type I error level in event-driven group-sequential designs with a closed testing procedure for progression-free and overall survival

Published: December 9, 2025 | arXiv ID: 2512.08658v1

By: Moritz Fabian Danzer , Kaspar Rufibach , Jan Beyersmann and more

In oncological clinical trials, overall survival (OS) is the gold-standard endpoint, but long follow-up and treatment switching can delay or dilute detectable effects. Progression-free survival (PFS) often provides earlier evidence and is therefore frequently used together with OS as multiple primary endpoints. Since in certain scenarios trial success may be defined if one of the two hypotheses involved can be rejected, a correction for multiple testing may be deemed necessary. Because PFS and OS are generally highly dependent, their test statistics are typically correlated. Ignoring this dependency (e.g. via a simple Bonferroni correction) is not power optimal. We develop a group-sequential testing procedure for the multiple primary endpoints PFS and OS that fully exhausts the family-wise error rate (FWER) by exploiting their dependence. Specifically, we characterize the joint asymptotic distribution of log-rank statistics across endpoints and multiple event-driven analysis cutoffs. Furthermore, we show that we can consistently estimate the covariance structure. Embedding these results in a closed testing procedure, we can recalculate critical values of the test statistics in order to spend the available type I error optimally. An important extension to the current literature is that we allow for both interim and final analysis to be event-driven. Simulations based on illness-death multi-state models empirically confirm FWER control for moderate to large sample sizes. Compared with a simple Bonferroni correction, the proposed methods recover roughly two thirds of the power loss for OS, increase disjunctive and conjunctive power, and enable meaningful early stopping. In planning, these gains translate into about 5% fewer OS events required to reach the targeted power. We also discuss practical issues in the implementation of such designs and possible extensions of the introduced method.

Category
Statistics:
Methodology