API: Attributions Methods¶
Context¶
In 2013, Simonyan et al. proposed a first attribution method, opening the way to a wide range of approaches which could be defined as follow:
Definition
The main objective in attributions techniques is to highlight the discriminating variables for decision-making. For instance, with Computer Vision (CV) tasks, the main goal is to underline the pixels contributing the most in the input image(s) leading to the model’s output(s).
Common API¶
All attribution methods inherit from the Base class BlackBoxExplainer
. This base class can be initialized with two parameters:
model
: the model from which we want to obtain attributions (e.g: InceptionV3, ResNet, ...)batch_size
: an integer which allows to either process inputs per batch (gradient-based methods) or process perturbed samples of an input per batch (inputs are therefore process one by one)
In addition, all class inheriting from BlackBoxExplainer
should implement an explain
method:
@abstractmethod
def explain(self,
inputs: Union[tf.data.Dataset, tf.Tensor, np.array],
targets: Optional[Union[tf.Tensor, np.array]] = None) -> tf.Tensor:
raise NotImplementedError()
def __call__(self,
inputs: tf.Tensor,
labels: tf.Tensor) -> tf.Tensor:
"""Explain alias"""
return self.explain(inputs, labels)
inputs
: Must be one of the following: a tf.data.Dataset
(in which case you should not provide targets), a tf.Tensor
or a np.ndarray
.
-
- If inputs are images, the expected shape of
inputs
is \((N, W, H, C)\) following the TF's conventions where:- - \(N\) is the number of inputs
- - \(W\) is the width of the images
- - \(H\) is the height of the images
- - \(C\) is the number of channels (works for \(C=3\) or \(C=1\), other values might not work or need further customization)
-
- If inputs are tabular data, the expected shape of
inputs
is \((N, W)\) where:- - \(N\) is the number of inputs
- - \(W\) is the feature dimension of a single input
Tip
Please refer to the table to see which methods might work with Tabular Data
-
- (Experimental) If inputs are Time Series, the expected shape of
inputs
is \((N, T, W)\)- - \(N\) is the number of inputs
- - \(T\) is the temporal dimension of a single input
-
- \(W\) is the feature dimension of a single input
Warning
By default
Lime
&KernelShap
will treat such inputs as grey images. You will need to define a custommap_to_interpret_space
when building such interpreters.
Warning
If your model is not following the same conventions it might lead to poor results.
On the bright side, there is only need for your
model
to be called oninputs
with such shape. Therefore, you can overcome this by writing a wrapper around your model.
For example, imagine you have a trained model which takes images with channel first (i.e \(inputs.shape=(N, C, W, H)\)). However, we saw that an explainer need images inputs with \((N, W, H, C)\) shape. Then, we can wrap the original model and redefine its call function so that it swaps inputs axes before proceding to the original call:
class TemplateModelWrapper(nn.Module):
def __init__(self, ncwh_model):
super(TemplateModelWrapper, self).__init__()
self.model = ncwh_model
def __call__(self, nwhc_inputs):
# transform your NWHC inputs to NCWH inputs
nchw_inputs = self._transform_inputs(nwhc_inputs)
# make predictions
outputs = self.ncwh_model(nchw_inputs)
return outputs
def _transform_inputs(self, nwhc_inputs):
# include in this function all transformation
# needed for your model to work with NWHC inputs
# , here for example we swap from channels last
# to channels first
ncwh_inputs = tf.transpose(nwhc_inputs, [0, 3, 1, 2])
return ncwh_inputs
wrapped_model = TemplateModelWrapper(model)
explainer = Saliency(wrapped_model)
# images should be (N, W, H, C) for the explain call
explanations = explainer.explain(images, labels)
Warning
In any case, when you are out of the scope of the original API, you should take a deep look at the source code to be sure that your Use Case will make sense.
targets
: Must be one of the following: a tf.Tensor
or a np.ndarray
.
Info
targets
should be a one hot encoding of the output you want an explanation of!
- - Therefore, targets's shape must match: \((N, outputs\_size)\)
- - \(N\) is the number of inputs
- - \(outputs\_size\) is the number of outputs
-
- For a classification task, the \(1\) value should be on the class of interest's (and only this one) index on the outputs. For example, I have three classes ('dogs, 'cats', 'fish') and a classifier with three outputs (the probability to belong to each class). I have an image of a fish and I want to know why my model think it is a fish. Then, the corresponding target of my image will be \([0, 0, 1]\)
Warning
Sometimes the explanation might be non-sense. One possible reason is that your model did not predict at all the output you asked an explanation for. For example, in the previous configuration, the model might have predicted a cat on your fish image. Therefore, you might want to see why it made such a prediction and use \([0, 1, 0]\) as target.
Tip
If you replace \(1\) by \(-1\) you can also see what goes against an output prediction!
- - For a regression task, you might have only one output then one target will be the vector \([1]\) (and not the regression value!)
Even though we made an harmonized API for all attributions methods it might be relevant for the user to distinguish Gradient based and Perturbation based methods, also often referenced respectively as white-box and black-box methods, as their hyperparameters settings might be quite different.
Perturbation-based approaches¶
Perturbation based methods focus on perturbing an input with a variety of techniques and, with the analysis of the resulting outputs, define an attribution representation. Therefore, there is no need to explicitly know the model architecture as long as forward pass is available, which explain why they are also referenced as black-box methods.
Therefore, to use perturbation-based approaches you do not need a TF model. To know more, please see the Callable documentation.
Xplique includes the following black-box attributions:
Method Name | Tutorial |
---|---|
KernelShap | |
Lime | |
Occlusion | |
Rise |
Gradient-based approaches¶
Those approaches are also called white-box methods as they require a full access to the model architecture, notably it should allow computing gradients with TensorFlow (for Xplique, in general any automatic differentiation framework would work). Indeed, the core idea with the gradient-based approach is to use back-propagation, along other techniques, not to update the model’s weights (which is already trained) but to reveal the most contributing inputs, potentially in a specific layer. Xplique includes the following white-box attributions:
Method Name | Tutorial |
---|---|
DeconvNet | |
GradCAM | |
GradCAM++ | |
GradientInput | |
GuidedBackpropagation | |
IntegratedGradients | |
Saliency | |
SmoothGrad | |
SquareGrad | |
VarGrad |
In addition, those methods inherits from WhiteBoxExplainer
(itself inheriting from BlackBoxExplainer
). Thus, an additional __init__
argument is added: output_layer
. It is the layer to target for the output (e.g logits or after softmax). If an int
is provided, it will be interpreted as a layer index, if a string
is provided it will look for the layer name. Default to the last layer.
Tip
It is recommended to use the layer before Softmax