Variable selection via knockoffs in missing data settings with categorical predictors
By: Silvia Bacci , Emanuela Dreassi , Leonardo Grilli and more
Potential Business Impact:
Finds important clues in messy student test data.
Large-scale assessment data typically include numerous categorical variables, often affected by missing values. Motivated by the challenges arising in this framework, we extend the knockoffs method for selecting predictors to settings with missing values. Our proposal relies on a preliminary phase consisting of multiple imputations of missing values. Each imputed dataset is then processed using a suitable knockoff filter. We evaluate the performance of the proposed method through a simulation study, showing satisfactory results consistent with a recently advocated cutting-edge method. We apply the method to large-scale assessment data collected by INVALSI about test scores of Italian students in grade 5 with many background variables. This case study is challenging, as most predictors have unordered categories, a setting not taken into account by traditional knockoffs methods. In addition, some of the key predictors are affected by missing values. The model includes random effects to account for the multilevel structure of students nested into schools. Our proposal to implement the knockoffs method within a multiple imputation framework proves to be feasible, flexible and effective.
Similar Papers
Novel Knockoff Generation and Importance Measures with Heterogeneous Data via Conditional Residuals and Local Gradients
Methodology
Finds important data in messy, mixed-up information.
Knockoffs for low dimensions: changing the nominal level post-hoc to gain power while controlling the FDR
Methodology
Finds hidden patterns more reliably in data.
Knockoffs Inference under Privacy Constraints
Methodology
Keeps data private while finding important information.