Hsic Attribution Method¶

View colab tutorial | View source | 📰 Paper

The Hsic attribution method from Novello, Fel, Vigouroux¹ explains a neural network's prediction for a given input image by assessing the dependence between the output and patches of the input. Thanks to the sample efficiency of HSIC Estimator, this black box method requires fewer forward passes to produce relevant explanations.

Let's consider two random variables which are the perturbation associated with each patch of the input image, $X_i, i \in \{1,...d\}$ with $d= \text{grid_size}^2$ image patches and the output $Y$. Let $X^1_i,...,X^p_i$ and $Y^1,...,Y^p$ be $p$ samples of $X_i$ and $Y$. HSIC attribution method requires selecting a kernel for the input and the output to construct an RKHS on which is computed the Maximum Mean Discrepancy, a dissimilarity metric between distributions. Let $k:\mathbb{R}^2 \rightarrow \mathbb{R}$ and $l:\mathbb{R}^2 \rightarrow \mathbb{R}$ the kernels selected for $X_i$ and $Y$, HSIC is estimated with an error $\mathcal{O}(1/\sqrt{p})$ using the estimator $$ \mathcal{H}^p_{X_i, Y} = \frac{1}{(p-1)^2} \operatorname{tr} (KHLH), $$ where $H, L, K \in \mathbb{R}^{p \times p}$ and $K_{ij} = k(x_i, x_j), L_{i,j} = l(y_i, y_j)$ and $H_{ij} = \delta(i=j) - p^{-1}$ where $\delta(i=j) = 1$ if $i=j$ and $0$ otherwise.

In the paper Making Sense of Dependence: Efficient Black-box Explanations Using Dependence Measure, the sampler LatinHypercube is used to sample the perturbations. Note however that the present implementation uses TFSobolSequence as default sampler because LatinHypercube requires scipy $\geq$ 1.7.0. you can nevertheless use this sampler -- which is included in the library -- by specifying it during the init of your explainer.

For the kernel $k$ applied on $X_i$, a modified Dirac kernel is used to enable an ANOVA-like decomposition property that allows assessing pairwise patch interactions (see the paper for more details). For the kernel $l$ of output $Y$, a Radial Basis Function (RBF) is used.

Tip

We recommend using a grid size of $7 \times 7$ to define the image patches. The paper uses a number of forwards of $1500$ to obtain the most faithful explanations and $750$ for a more budget - but still faithful - version.

Example¶

Low budget version

from xplique.attributions import HsicAttributionMethod

# load images, labels and model
# ...

explainer = HsicAttributionMethod(model, grid_size=7, nb_design=750)
explanations = explainer(images, labels)

High budget version

from xplique.attributions import HsicAttributionMethod

# load images, labels and model
# ...

explainer = HsicAttributionMethod(model, grid_size=7, nb_design=1500)
explanations = explainer(images, labels)

Recommended version, (you need scipy $\geq$ 1.7.0)

from xplique.attributions import HsicAttributionMethod
from xplique.attributions.global_sensitivity_analysis import LatinHypercube

# load images, labels and model
# ...

explainer = HsicAttributionMethod(model, 
                                  grid_size=7, nb_design=1500,
                                  sampler = LatinHypercube(binary=True))
explanations = explainer(images, labels)

Notebooks¶

Attribution Methods: Getting started