DiffKnock: Diffusion-based Knockoff Statistics for Neural Networks Inference
By: Heng Ge, Qing Lu
Potential Business Impact:
Finds important genes in cell data.
We introduce DiffKnock, a diffusion-based knockoff framework for high-dimensional feature selection with finite-sample false discovery rate (FDR) control. DiffKnock addresses two key limitations of existing knockoff methods: preserving complex feature dependencies and detecting non-linear associations. Our approach trains diffusion models to generate valid knockoffs and uses neural network--based gradient and filter statistics to construct antisymmetric feature importance measures. Through simulations, we showed that DiffKnock achieved higher power than autoencoder-based knockoffs while maintaining target FDR, indicating its superior performance in scenarios involving complex non-linear architectures. Applied to murine single-cell RNA-seq data of LPS-stimulated macrophages, DiffKnock identifies canonical NF-$\kappa$B target genes (Ccl3, Hmox1) and regulators (Fosb, Pdgfb). These results highlight that, by combining the flexibility of deep generative models with rigorous statistical guarantees, DiffKnock is a powerful and reliable tool for analyzing single-cell RNA-seq data, as well as high-dimensional and structured data in other domains.
Similar Papers
Knockoffs Inference under Privacy Constraints
Methodology
Keeps data private while finding important information.
Confirmatory Biomarker Identification via Derandomized Knockoffs for Cox Regression with k-FWER Control
Methodology
Finds important health clues for better survival.
Novel Knockoff Generation and Importance Measures with Heterogeneous Data via Conditional Residuals and Local Gradients
Methodology
Finds important data in messy, mixed-up information.