Interpretable Clustering with Adaptive Heterogeneous Causal Structure Learning in Mixed Observational Data
By: Wenrui Li, Qinghao Zhang, Xiaowo Wang
Potential Business Impact:
Finds hidden causes in messy data.
Understanding causal heterogeneity is essential for scientific discovery in domains such as biology and medicine. However, existing methods lack causal awareness, with insufficient modeling of heterogeneity, confounding, and observational constraints, leading to poor interpretability and difficulty distinguishing true causal heterogeneity from spurious associations. We propose an unsupervised framework, HCL (Interpretable Causal Mechanism-Aware Clustering with Adaptive Heterogeneous Causal Structure Learning), that jointly infers latent clusters and their associated causal structures from mixed-type observational data without requiring temporal ordering, environment labels, interventions or other prior knowledge. HCL relaxes the homogeneity and sufficiency assumptions by introducing an equivalent representation that encodes both structural heterogeneity and confounding. It further develops a bi-directional iterative strategy to alternately refine causal clustering and structure learning, along with a self-supervised regularization that balance cross-cluster universality and specificity. Together, these components enable convergence toward interpretable, heterogeneous causal patterns. Theoretically, we show identifiability of heterogeneous causal structures under mild conditions. Empirically, HCL achieves superior performance in both clustering and structure learning tasks, and recovers biologically meaningful mechanisms in real-world single-cell perturbation data, demonstrating its utility for discovering interpretable, mechanism-level causal heterogeneity.
Similar Papers
Learning Identifiable Structures Helps Avoid Bias in DNN-based Supervised Causal Learning
Machine Learning (CS)
Finds hidden causes in data better.
Hierarchical Causal Structure Learning
Methodology
Finds causes in data with different levels.
AdaptHetero: Machine Learning Interpretation-Driven Subgroup Adaptation for EHR-Based Clinical Prediction
Machine Learning (CS)
Helps doctors build better health predictions for groups.