API: Attributions Methods¶

Attribution Methods: Getting started

Context¶

In 2013, Simonyan et al. proposed a first attribution method, opening the way to a wide range of approaches which could be defined as follow:

Definition

The main objective in attributions techniques is to highlight the discriminating variables for decision-making. For instance, with Computer Vision (CV) tasks, the main goal is to underline the pixels contributing the most in the input image(s) leading to the model’s output(s).

Common API¶

explainer = Method(model, batch_size, operator)
explanation = explainer(inputs, targets)

The API have two steps:

explainer instantiation: Method is an attribution method among those displayed methods tables. It inherits from the Base class BlackBoxExplainer. Their initialization takes 3 parameters apart from the specific ones and generates an explainer:
- model: the model from which we want to obtain attributions (e.g: InceptionV3, ResNet, ...), see the model section for more details and specifications.
- batch_size: an integer which allows to either process inputs per batch (gradient-based methods) or process perturbed samples of an input per batch (inputs are therefore processed one by one).
- operator: enum identifying the task of the model (which is Classification by default), string identifying the task, or function to explain, see the task and operator section for more detail.
explainer call: The call to explainer generates the explanations, it takes two parameters:
- inputs: the samples on which the explanations are requested, see inputs section for more detail.
- targets: another parameter to specify what to explain in the inputs, it changes depending on the operator, see targets section for more detail.

Info

The __call__ method of explainers is an alias for the explain method.

Info

This documentation page covers the different parameters of the common API of attributions methods. It is common between the different tasks covered by Xplique for attribution methods.

Methods¶

Even though we made an harmonized API for all attributions methods, it might be relevant for the user to distinguish Perturbation-based methods and Gradient-based methods, also often referenced respectively as black-box and white-box methods, as their hyperparameters settings might be quite different.

Perturbation-based approaches¶

Perturbation based methods focus on perturbing an input with a variety of techniques and, with the analysis of the resulting outputs, define an attribution representation. Thus, there is no need to explicitly know the model architecture as long as forward pass is available, which explains why they are also referenced as black-box methods.

Therefore, to use perturbation-based approaches you do not need a TF model. To know more, please see the Callable documentation.

Xplique includes the following black-box attributions:

Method Name and Documentation link	Available with TF	Available with PyTorch*
KernelShap	✔	✔
Lime	✔	✔
Occlusion	✔	✔
Rise	✔	✔
Sobol Attribution	✔	✔
Hsic Attribution	✔	✔

*: Before using a PyTorch model it is highly recommended to read the dedicated documentation

Gradient-based approaches¶

Those approaches are also called white-box methods as they require a full access to the model's architecture, notably it must allow computing gradients. Indeed, the core idea with the gradient-based approaches is to use back-propagation, not to update the model’s weights (which is already trained) but to reveal the most contributing inputs, potentially in a specific layer. All methods are available when the model works with TensorFlow but most methods also work with PyTorch (see Xplique for PyTorch documentation)

Method Name and Documentation link	Available with TF	Available with PyTorch*
DeconvNet	✔	❌
GradCAM	✔	❌
GradCAM++	✔	❌
GradientInput	✔	✔
GuidedBackpropagation	✔	❌
IntegratedGradients	✔	✔
Saliency	✔	✔
SmoothGrad	✔	✔
SquareGrad	✔	✔
VarGrad	✔	✔

*: Before using a PyTorch model it is highly recommended to read the dedicated documentation

In addition, these methods inherit from WhiteBoxExplainer (itself inheriting from BlackBoxExplainer). Thus, two additional __init__ arguments are added:

output_layer. It is the layer to target for the output (e.g logits or after softmax). If an int is provided, it will be interpreted as a layer index, if a string is provided it will look for the layer name. Default to the last layer.
reducer. For images, most gradient-based provide a value for each channel, however, for consistency, it was decided that for images, explanations will have the shape \((n, h, w, 1)\). Therefore, gradient-based methods need to reduce the channel dimension of their image explanations and the reducer parameter choose how to do it among {"mean", "min", "max", "sum", None}. In the case None is give, the channel dimension is not reduced. The default value is "mean" for methods excepts Saliency which is "max" to comply with the paper and GradCAM and GradCAMPP which are not concerned.

Tip

It is recommended to use the layer before Softmax.

Warning

The output_layer parameter will work well with TensorFlow models. However, it will not work with PyTorch models. For PyTorch, one should directly manipulate the model to focus on the layers of interest.

Info

The "white-box" explainers that work with PyTorch are those that only require the gradient of the model without having to "modify" some part of the model (e.g. Deconvnet will commute all original ReLU by a custom ReLU policy)

`model`¶

model is the primary parameter of attribution methods: it represents model from which explanations are required. Even though we tried to support a wide-range of models, our attributions framework relies on some assumptions which we propose to see in this section.

Warning

In case the model does not respect the specifications, a wrapper will be needed as described in the Models not respecting the specifications section.

In practice, we expect the model to be callable for the inputs parameters -- i.e. we can do model(inputs). We expect this call to produce the outputs variables that are the predictions of the model on those inputs. As for most attribution methods, we need to manipulate and/or link the outputs to the inputs. We assume that the latter follow conventional shapes described in the inputs section.

Info

Depending on the task and operator there may be supplementary specifications for the model, mainly on the output of the model.

Tasks and `operator`¶

operator is one of the main parameters for both attribution methods and metrics. It defines the function that we want to explain. E.g.: In the case we have a classifier model, the function that we might want to explain is the one that given a target provides us the score of the model for that specific target -- i.e \(model(input)[target]\).

Note

The operator parameter is a feature available for version > \(1.\). The operator default values are the ones used before the introduction of this new feature!

Leitmotiv¶

The operator parameter was introduced to offer users a flexible way to adapt current attribution methods or metrics. It should help them to empirically tackle new use-cases/new tasks. Broadly speaking, it should amplify the user's ability to experiment. However, this also implies that it is the user's responsibility to make sure that its derivations are in-scope of the original method and make sense.

`operator` in practice¶

In practice, the user does not manipulate the function in itself. The use of the operator can be divided in three steps:

Specify the operator to use in the method initialization (as shown in the API description). Possible values are either an enum encoding the task, a string, or a custom operator.
Make sure the model follows the model's specification relative to the selected task.
Specify what to explain in inputs through targets, the targets parameter specifications depend on the task.

The tasks covered¶

The operator parameter depends on the task to explain, as the function to explain depends on the task. In the case of Xplique, the tasks in the following table are supported natively, but new operators are welcome, please feel free to contribute.

Task and Documentation link	`operator` parameter value from `xplique.Tasks` Enum	Tutorial link
Classification	`CLASSIFICATION`
Object Detection	`OBJECT_DETECTION`
Regression	`REGRESSION`
Semantic Segmentation	`SEMANTIC_SEGMENTATION`

Info

Classification is the default behavior, i.e., if no operator value is specified or None is given.

Warning

To apply Xplique on different tasks, specifying the value of the operator is not enough. Be sure to respect the "operator in practice" steps.

Operators' Signature¶

An operator is a function that we want to explain. This function takes as input \(3\) parameters:

the model to explain as in the method instantiation (specifications in the model section).
the inputs parameter representing the samples to explain as in method call (specifications in inputs section).
the targets parameter encoding what to explain in the inputs (specifications in targets section).

This function should return a vector of scalar value of size \((N,)\) where \(N\) is the number of inputs in inputs -- i.e a scalar score per input.

Note

For gradient-based methods to work with the operator, it needs to be differentiable with respect to inputs.

The operators mechanism¶

Operators behavior for Black-box attribution methods

For attribution approaches that do not require gradient computation, we mostly need to query the model. Thus, those methods need an inference function. If you provide an operator, it will be the inference function.

More concretely, for this kind of approach, you want to compare some valued function for an original input and perturbed version of it:

original_scores = operator(model, original_inputs, original_targets)

# depending on the attribution method, this `perturbation_function` is different
perturbed_inputs, perturbed_targets = perturbation_function(original_inputs, original_targets)
perturbed_scores = operator(model, perturbed_inputs, perturbed_targets)

# example of comparison of interest
diff_scores = math.sqrt((original_scores - perturbed_scores)**2)

Operators behavior for White-box attribution methods

These methods usually require some gradients computation. The gradients that will be used are the ones of the operator function (see the get_gradient_of_operator method in the Providing custom operator section).

Providing custom operator¶

The operator parameter also supports functions (i.e. Callable), this is considered a custom operator and in this case, you should be aware of the following points:

An assertion will be made to ensure it respects operators' signature.
If you use any white-box explainer, your operator will go through the get_gradient_of_operator function below.

Code of the get_gradient_of_operator function.

def get_gradient_of_operator(operator):
    """
    Get the gradient of an operator.

    Parameters
    ----------
    operator
        Operator of which to compute the gradient.

    Returns
    -------
    gradient
        Gradient of the operator.
    """
    @tf.function
    def gradient(model, inputs, targets):
        with tf.GradientTape() as tape:
            tape.watch(inputs)
            scores = operator(model, inputs, targets)

        return tape.gradient(scores, inputs)

    return gradient

Tip

Writing your operator with only tensorflow functions should increase your chance that this method does not yield any errors. In addition, providing a @tf.function decorator is also welcome!

Warning

The targets parameter is the key to specifying what to explain and differs greatly depending on the operator.

Models not respecting the specifications¶

Warning

In any case, when you are out of the scope of the original API, you should take a deep look at the source code to be sure that your Use Case will make sense.

My inputs follow a different shape convention¶

In the case where you want to handle images or time series data that does not follow the previous conventions, it is recommended to reshape the data to the expected shape for the explainers (attribution methods) to handle them correctly. Then, you can simply define a wrapper of your model so that data is reshape to your model convenience when it is called.

For example, if you have a model that classifies images but want the images to be channel-first (i.e. with \((N, C, H, W)\) shape) then you should:

Move the axis so inputs are \((N, H, W, C)\) for the explainers
Write the following wrapper for your model:

Example of a wrapper.

class ModelWrapper(tf.keras.models.Model):
    def __init__(self, nchw_model):
        super(ModelWrapper, self).__init__()
        self.model = nchw_model

    def __call__(self, nhwc_inputs):
        # transform the NHWC inputs (wanted for the explainers) back to NCHW inputs
        nchw_inputs = self._transform_inputs(nhwc_inputs)
        # make predictions
        outputs = self.nchw_model(nchw_inputs)

        return outputs

    def _transform_inputs(self, nhwc_inputs):
        # include in this function all transformation
        # needed for your model to work with NHWC inputs
        # , here for example we move axis from channels last
        # to channels first
        nchw_inputs = np.moveaxis(nhwc_inputs, [3, 1, 2], [1, 2, 3])

        return nchw_inputs

wrapped_model = ModelWrapper(model)
explainer = Saliency(wrapped_model)
# images should be (N, H, W, C) for the explain call
explanations = explainer.explain(images, labels)

I have a PyTorch model¶

Then you should definitely take a look at the PyTorch documentation!

I have a model that is neither a tf.keras.Model nor a torch.nn.Module¶

Then you should take a look at the Callable documentation or you could take inspiration on the PyTorch Wrapper to write a wrapper that will integrate your model into our API!

`inputs` and data types¶

Warning

inputs in this section correspond to the argument in the explain method of BlackBoxExplainer. The model specified at the initialization of the BlackBoxExplainer should be able to be called through model(inputs). Otherwise, a wrapper needs to be implemented as described in the Models not respecting the specifications section.

inputs: Must be one of the following: a tf.data.Dataset (in which case you should not provide targets), a tf.Tensor or a np.ndarray.

Examples are provided in the different tutorials: images, time-series, and tabular data. The conventions are as follow:

If inputs are images, the expected shape of inputs is \((N, H, W, C)\) following the TF's conventions where:
- \(N\): the number of inputs
- \(H\): the height of the images
- \(W\): the width of the images
- \(C\): the number of channels (works for \(C=3\) or \(C=1\), other values might not work or need further customization)
If inputs are time-series, the expected shape of inputs is \((N, T, W)\)
- \(N\): the number of inputs
- \(T\): the temporal dimension of a single input
- \(W\): the feature dimension of a single input
If inputs are tabular data, the expected shape of inputs is \((N, W)\) where:
- \(N\): the number of inputs
- \(W\): the feature dimension of a single input

Tip

Please refer to the table of attributions available to see which methods might work with for the different data types.

Note

If your model is not following the same conventions, please refer to the model not respecting the specification documentation.

`targets`¶

targets: Must be one of the following: a tf.Tensor or a np.ndarray. It has a shape of \((N, ...)\) where N should match the first dimension of inputs, while \(...\) depend on the task and operators. Indeed, the targets parameter is highly dependent on the operator selected for the attribution methods, hence, for more information please refer to the tasks and operators table which will lead you to the pertinent task documentation page.

API: Attributions Methods¶

Context¶

Common API¶

Methods¶

Perturbation-based approaches¶

Gradient-based approaches¶

model¶

Tasks and operator¶

Leitmotiv¶

operator in practice¶

The tasks covered¶

Operators' Signature¶

The operators mechanism¶

Providing custom operator¶

Models not respecting the specifications¶

My inputs follow a different shape convention¶

I have a PyTorch model¶

I have a model that is neither a tf.keras.Model nor a torch.nn.Module¶

inputs and data types¶

targets¶

`model`¶

Tasks and `operator`¶

`operator` in practice¶

`inputs` and data types¶

`targets`¶