Object detection with Xplique¶
Attributions: Object Detection tutorial
Which kind of tasks are supported by Xplique?¶
With the operator's api you can treat many different problems with Xplique. There is one operator for each task.
Task and Documentation link | operator parameter value from xplique.Tasks Enum |
Tutorial link |
---|---|---|
Classification | CLASSIFICATION |
|
Object Detection | OBJECT_DETECTION |
|
Regression | REGRESSION |
|
Semantic Segmentation | SEMANTIC_SEGMENTATION |
Info
They all share the API for Xplique attribution methods.
Simple example¶
import xplique
from xplique.attributions import Saliency
from xplique.metrics import Deletion
# load images and model
# ...
predictions = model(images)
explainer = Saliency(model, operator=xplique.Tasks.OBJECT_DETECTION)
# explain each image - bounding-box pair separately
for all_bbx_for_one_image, image in zip(predictions, images):
# an image is needed per bounding box, so we tile them
repeated_image = tf.tile(tf.expand_dims(image, axis=0),
(tf.shape(all_bbx_for_one_image)[0], 1, 1, 1))
explanations = explainer(repeated_image, all_bbx_for_one_image)
# either compute several score or
# concatenate repeated images and corresponding boxes in one tensor
metric_for_one_image = Deletion(model, repeated_image, all_bbx_for_one_image,
operator=xplique.Tasks.OBJECT_DETECTION)
score_saliency = metric(explanations)
How to use it?¶
To apply attribution methods, the common API documentation describes the parameters and how to fix them. However, depending on the task and thus on the operator
, there are three points that vary:
-
The
operator
parameter value, it is an Enum or a string identifying the task, -
The model's output specification, as
model(inputs)
is used in the computation of the operators, and -
The
targets
parameter format, indeed, thetargets
parameter specifies what to explain and the format of such specification depends on the task.
The operator
¶
How to specify it¶
In Xplique, to adapt attribution methods, you should specify the task to the operator
parameter. In the case of object detection, with either:
Method(model, operator="object detection")
# or
Method(model, operator=xplique.Tasks.OBJECT_DETECTION)
Info
There are several variants of the object detection operator to explain part of the prediction.
The computation¶
This operator is a generalization of DRise method introduced by Petsiuk & al. [^1] to most attribution methods. The computation is the same as the one described in the DRise paper. The DRise can be divided into two principles:
- The matching: DRise extends Rise (described in detail in the Rise tutorial) to explain object detection. Rise is a perturbation-based method, hence current predictions are compared to predictions on perturbed inputs. However, object detectors predict several boxes with no consistency in the order, thus DRise chooses to match the current bounding box to the most similar one and use the similarity metric as the perturbation score.
- The similarity metric: This is the score used by DRise to match bounding boxes. It uses the three parts of a bounding box prediction, the position of the box, the box objectness, and the associated class. A score is computed for each of those three parts and these scores are multiplied:
With: $$ intersection_score = IOU(coordinates_{ref}, coordinates_{pred}) $$
Info
The intersection score of the operator is the IOU (Intersection Over Union) by default but can be modified by specifying as custom intersection score.
Info
With the DRise formula the methods explain the box position, the box objectness, and the class prediction at the same time. However, the user may want to explain them separately, therefore several variants of this operator are available in Xplique and described in What can we explain and how? section.
The behavior¶
- In the case of perturbation-based methods, the perturbation score is the similarity metric aforementioned.
- For gradient-based methods, the gradient of the similarity metric is given, but no matching is necessary as no perturbation is made.
Model's output¶
We expect model(inputs)
to yield a \((n, nb\_boxes, 4 + 1 + nb\_classes)\) tensors or array where:
- \(n\): the number of inputs, it should match the first dimension of
inputs
. - \(nb\_boxes\): a fixed number of bounding boxes predicted for a given image (no NMS).
- \((4 + 1 + nb\_classes)\): the encoding of a bounding box prediction
- \(4\): the bounding box coordinates \((x_{top\_left}, y_{top\_left}, x_{bottom\_right}, y_{bottom\_right})\), with \(x_{top\_left} < x_{bottom\_right}\) and \(y_{top\_left} < y_{bottom\_right}\).
- \(1\): the objectness or detection probability of the bounding box,
- \(nb\_classes\): the class of the bounding box, a soft class predictions not a one-hot encoding.
Warning
Object detection models provided to the explainer should not include NMS and classification should be soft classification not one-hot encoding. Furthermore, if the model does not match the expected format, a wrapper may be needed. (see the tutorial for an example).
Info
PyTorch models are not natively treated by Xplique, however, a simple wrapper is available in pytorch documentation.
The targets
parameter¶
Role¶
The targets
parameter specifies what is to explain in the inputs
, it is passed to the explain
or to the __call__
method of an explainer or metric and used by the operators. In the case of object detection, it indicates which box to explain, furthermore, it gives the initial predictions to the operator as the reference for perturbation-based methods.
Format¶
The targets
parameter in the case of semantic segmentation should have the same shape as the model's output as the same computation are made. Concretely, the targets
parameter should have a shape of \((n, 4 + 1 + nb\_classes)\) to explain a bounding box for each input (detail in model's output description).
Additionally, there is a possibility to explain a group of bounding boxes at the same time described in the explaining several bounding boxes section which requires a different shape.
In practice¶
To explain each bounding box individually, the images need to be repeated. Indeed, object detector predict several bounding boxes per image and the first dimension of inputs
and targets
should match as it corresponds to the sample dimension. Therefore, the easiest way to obtain this is for each image to repeat it so that it matches the number of bounding boxes to explain for this image.
In the simple example, there is a loop on the images - predictions pair, then images are repeated to match the number of predicted bounding boxes, and finally, the targets
parameter takes the predicted bounding boxes.
Tip
AS specified in the model's output specification, the NMS (Non Maximum Suppression) should not be included in the model. However, it can be used to select the bounding boxes to explain.
Warning
Repeating images may create a tensor that exceeds memory for large images and/or when many bounding boxes are to be explained. In this case, we advise to make a loop on the images, then a loop on the boxes.
Explain several bounding boxes simultaneously¶
The user may not want to explain each bounding box individually but several bounding boxes at the same time (i.e a set of pedestrian bounding boxes on a sidewalk). In this case, the targets
parameter shape will not be \((n, 4 + 1 + nb\_classes)\) but \((n, nb\_boxes, 4 + 1 + nb\_classes)\), with \(nb\_boxes\) the number of boxes to explain simultaneously. In this case, \(nb\_boxes\) bounding boxes are associated to each sample and a single attribution map is returned. However, for different images, \(nb\_boxes\) may not be fix and it may not be possible to make a single tensor in this case. Thus, we recommend to treat each group of bounding boxes with a different call to the attribution method with \(n=1\).
To return one explanation for several bounding boxes, Xplique takes the mean of the bounding boxes individual explanations and returns it.
For a concrete example, please refer to the Attributions: Object detection tutorial.
What can be explained and how?¶
The different elements in object detection¶
In object detection, the prediction for a given bounding box include several pieces of information: The box position, the box probability of containing something, and the class of the detected object. Therefore we may want to explain each of them separately, however, the DRise method of matching bounding boxes should be kept in mind. Indeed, the box position cannot be removed from the score, otherwise, the explanation may not correspond to the same object.
The different operator's variants and what they explain¶
The Xplique library allows the specification of which part of the prediction to explain via a set 4 operators: the one as defined by the DRise formula and three variants:
-
"object detection"
: the one described in the operator section:\[score = intersection\_score * detection\_probability * classification\_score\] -
"object detection box position"
: explains only the bounding box position:\[score = intersection\_score\] -
"object detection box proba"
: explains the probability of a bounding box to contain something:\[score = intersection\_score * detection\_probability\] -
"object detection box class"
: explains the class of a bounding box:\[score = intersection\_score * classification\_score\]
Custom intersection score¶
The default intersection score is IOU, but it is possible to define a custom intersection score. The only constraint is that it should follow xplique.commons.object_detection_operator._box_iou
signature for it to work.
from xplique.attributions import Saliency
from xplique.commons.operators import object_detection_operator
custom_intersection_score = ...
custom_operator = lambda model, inputs, targets: object_detection_operator(
model, inputs, targets, intersection_score=custom_intersection_score
)
explainer = Saliency(model, operator=custom_operator)
... # All following steps are the same as the examples
[^1] Black-box Explanation of Object Detectors via Saliency Maps (2021)