Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation

13 Feb 2024 · Xuexin Chen, Ruichu Cai, Zhengting Huang, Yuxuan Zhu, Julien Horwood, Zhifeng Hao, Zijian Li, Jose Miguel Hernandez-Lobato ·

We investigate the problem of explainability in machine learning. To address this problem, Feature Attribution Methods (FAMs) measure the contribution of each feature through a perturbation test, where the difference in prediction is compared under different perturbations. However, such perturbation tests may not accurately distinguish the contributions of different features, when their change in prediction is the same after perturbation. In order to enhance the ability of FAMs to distinguish different features' contributions in this challenging setting, we propose to utilize the Probability of Necessity and Sufficiency (PNS) that perturbing a feature is a necessary and sufficient cause for the prediction to change as a measure of feature importance. Our approach, Feature Attribution with Necessity and Sufficiency (FANS), computes the PNS via a perturbation test involving two stages (factual and interventional). In practice, to generate counterfactual samples, we use a resampling-based approach on the observed samples to approximate the required conditional distribution. We demonstrate that FANS outperforms existing attribution methods on six benchmarks. Our source code is available at \url{https://github.com/DMIRLAB-Group/FANS}.

PDF Abstract