Enhancing interpretability of rule-based classifiers through feature graphs
By: Christel Sirocchi, Damiano Verda
Potential Business Impact:
Helps doctors understand patient data for better diagnoses.
In domains where transparency and trustworthiness are crucial, such as healthcare, rule-based systems are widely used and often preferred over black-box models for decision support systems due to their inherent interpretability. However, as rule-based models grow complex, discerning crucial features, understanding their interactions, and comparing feature contributions across different rule sets becomes challenging. To address this, we propose a comprehensive framework for estimating feature contributions in rule-based systems, introducing a graph-based feature visualisation strategy, a novel feature importance metric agnostic to rule-based predictors, and a distance metric for comparing rule sets based on feature contributions. By experimenting on two clinical datasets and four rule-based methods (decision trees, logic learning machines, association rules, and neural networks with rule extraction), we showcase our method's capability to uncover novel insights on the combined predictive value of clinical features, both at the dataset and class-specific levels. These insights can aid in identifying new risk factors, signature genes, and potential biomarkers, and determining the subset of patient information that should be prioritised to enhance diagnostic accuracy. Comparative analysis of the proposed feature importance score with state-of-the-art methods on 15 public benchmarks demonstrates competitive performance and superior robustness. The method implementation is available on GitHub: https://github.com/ChristelSirocchi/rule-graph.
Similar Papers
Surrogate Interpretable Graph for Random Decision Forests
Machine Learning (CS)
Shows how computer health predictions work.
Interpretable graph-based models on multimodal biomedical data integration: A technical review and benchmarking
Genomics
Helps doctors understand diseases using patient data.
Interpretable Network-assisted Random Forest+
Machine Learning (Stat)
Shows how computers learn from connected data.