Convex Mixed-Integer Programming for Causal Additive Models with Optimization and Statistical Guarantees
By: Xiaozhu Zhang , Nir Keret , Ali Shojaie and more
Potential Business Impact:
Finds hidden connections between things from data.
We study the problem of learning a directed acyclic graph from data generated according to an additive, non-linear structural equation model with Gaussian noise. We express each non-linear function through a basis expansion, and derive a maximum likelihood estimator with a group l0-regularization that penalizes the number of edges in the graph. The resulting estimator is formulated through a convex mixed-integer program, enabling the use of branch-and-bound methods to obtain a solution that is guaranteed to be accurate up to a pre-specified optimality gap. Our formulation can naturally encode background knowledge, such as the presence or absence of edges and partial order constraints among the variables. We establish consistency guarantees for our estimator in terms of graph recovery, even when the number of variables grows with the sample size. Additionally, by connecting the optimality guarantees with our statistical error bounds, we derive an early stopping criterion that allows terminating the branch-and-bound procedure while preserving consistency. Compared with existing approaches that either assume equal error variances, restrict to linear structural equation models, or rely on heuristic procedures, our method enjoys both optimization and statistical guarantees. Extensive simulations and real-data analysis show that the proposed method achieves markedly better graph recovery performance.
Similar Papers
Multilinear and Linear Programs for Partially Identifiable Queries in Quasi-Markovian Structural Causal Models
Artificial Intelligence
Finds hidden causes even with missing information.
Causal analysis of extreme risk in a network of industry portfolios
Risk Management
Finds how money problems spread through banks.
Non-negative DAG Learning from Time-Series Data
Signal Processing
Finds hidden causes in changing data.