Classification explanations with Xplique¶
Attributions: Getting started tutorial
Which kind of tasks are supported by Xplique?¶
With the operator's api you can treat many different problems with Xplique. There is one operator for each task.
Task and Documentation link | operator parameter value from xplique.Tasks Enum |
Tutorial link |
---|---|---|
Classification | CLASSIFICATION |
|
Object Detection | OBJECT_DETECTION |
|
Regression | REGRESSION |
|
Semantic Segmentation | SEMANTIC_SEGMENTATION |
Info
They all share the API for Xplique attribution methods.
Simple example¶
import xplique
from xplique.attributions import Saliency
from xplique.metrics import Deletion
# load inputs and model
# ...
# for classification it is recommended to remove softmax layer if there is one
# model.layers[-1].activation = tf.keras.activations.linear
# for classification, `targets` are the one hot encoding of the predicted class
targets = tf.one_hot(tf.argmax(model(inputs), axis=-1), depth=nb_classes, axis=-1)
# compute explanations by specifying the classification operator
explainer = Saliency(model, operator=xplique.Tasks.CLASSIFICATION)
explanations = explainer(inputs, targets)
# compute metrics on those explanations
# if the softmax was removed,
# it is possible to specify it to obtain more interpretable metrics
metric = Deletion(model, inputs, targets,
operator=xplique.Tasks.CLASSIFICATION, activation="softmax")
score_saliency = metric(explanations)
Tip
In general, if you are doing classification tasks, it is better to not include the final softmax layer in your model but to work with logits instead!
How to use it?¶
To apply attribution methods, the common API documentation describes the parameters and how to fix them. However, depending on the task and thus on the operator
, there are three points that vary:
-
The
operator
parameter value, it is an Enum or a string identifying the task, -
The model's output specification, as
model(inputs)
is used in the computation of the operators, and -
The
targets
parameter format, indeed, thetargets
parameter specifies what to explain and the format of such specification depends on the task.
The operator
¶
How to specify it¶
In Xplique, to adapt attribution methods, you should specify the task to the operator
parameter. In the case of classification, with either:
Method(model)
# or
Method(model, operator="classification")
# or
Method(model, operator=xplique.Tasks.CLASSIFICATION)
Info
Classification if the default behavior of Xplique attribution methods, hence there is no need to specify it. Nonetheless, it is recommended to still do so to ensure a good comprehension of what is explained.
The computation¶
The classification operator multiplies model's predictions on inputs
with targets
and sum it for each input to explain. However, only one value should be non-zero in targets
, thus, the classification operator returns the model output for the specified (via targets
) class.
scores = tf.reduce_sum(model(inputs) * targets, axis=-1)
The behavior¶
- In the case of perturbation-based methods, the perturbation score corresponds to the difference between the initial logits value for the predicted classes and the same logits for predictions over perturbed inputs.
- For gradient-based methods, the gradient of logits of interest with respect to the inputs.
The logits of interest are specified via the targets
parameter described in the related section.
Model's output¶
We expect model(inputs)
to yield a \((n, c)\) tensor or array where \(n\) is the number of input samples and \(c\) is the number of classes.
The targets
parameter¶
Role¶
The targets
parameter specifies what is to explain in the inputs
, it is passed to the explain
or to the __call__
method of an explainer or metric and used by the operators. In the case of classification, it indicates the class to explain, or specifies contrastive explanations.
Format¶
The targets
parameter in the case of classification should have the same shape as the model's output as they are multiplied point-wise. Hence, the shape is \((n, c)\) with \(n\) the number of samples to be explained (it should match the first dimension of inputs
) and \(c\) the number of classes. The targets
parameter expects values among \({-1, 0, 1}\) but most values should be \(0\) and most of the time only one should be \(1\) for each sample. \(-1\) are only used for contrastive explanations.
In practice¶
In the simple example, the targets
value provided is computed with tf.one_hot(tf.argmax(model(inputs), axis=-1), axis=-1)
. Literally, the one hot encoding of the predicted class, this specifies which class to explain.
Tip
It is better to explain the predicted class than the expected class as the goal is to explain the model's prediction.
What can be explained with it?¶
Explain the predicted class¶
By specifying targets
with a one hot encoding of the predicted class, the explanation will highlight which features were important for this prediction.
Contrastive explanations¶
By specifying targets
with zeros everywhere, 1
for the first class, and -1
for the second class. The explanation will show which features were important to predict the first and and not the second one.
Tip
If the model made a mistake, an interesting explanation is predicted class versus expected class.