Score: 0

Variable selection via knockoffs in missing data settings with categorical predictors

Published: August 8, 2025 | arXiv ID: 2508.06138v1

By: Silvia Bacci , Emanuela Dreassi , Leonardo Grilli and more

Potential Business Impact:

Finds important clues in messy student test data.

Large-scale assessment data typically include numerous categorical variables, often affected by missing values. Motivated by the challenges arising in this framework, we extend the knockoffs method for selecting predictors to settings with missing values. Our proposal relies on a preliminary phase consisting of multiple imputations of missing values. Each imputed dataset is then processed using a suitable knockoff filter. We evaluate the performance of the proposed method through a simulation study, showing satisfactory results consistent with a recently advocated cutting-edge method. We apply the method to large-scale assessment data collected by INVALSI about test scores of Italian students in grade 5 with many background variables. This case study is challenging, as most predictors have unordered categories, a setting not taken into account by traditional knockoffs methods. In addition, some of the key predictors are affected by missing values. The model includes random effects to account for the multilevel structure of students nested into schools. Our proposal to implement the knockoffs method within a multiple imputation framework proves to be feasible, flexible and effective.

Novel Knockoff Generation and Importance Measures with Heterogeneous Data via Conditional Residuals and Local Gradients

Methodology

Finds important data in messy, mixed-up information.

20 Aug 2025 0

88%

Knockoffs for low dimensions: changing the nominal level post-hoc to gain power while controlling the FDR

Methodology

Finds hidden patterns more reliably in data.

14 Nov 2025 1

86%

Knockoffs Inference under Privacy Constraints

Methodology

Keeps data private while finding important information.

11 Jun 2025 0

View PDF Login to Bookmark

Country of Origin

🇮🇹 Italy

Page Count

40 pages

Variable selection via knockoffs in missing data settings with categorical predictors

Finds important clues in messy student test data.

Technical Abstract

Novel Knockoff Generation and Importance Measures with Heterogeneous Data via Conditional Residuals and Local Gradients

Knockoffs for low dimensions: changing the nominal level post-hoc to gain power while controlling the FDR

Knockoffs Inference under Privacy Constraints