Semantic segmentation explanations with Xplique¶
Attributions: Semantic segmentation tutorial
Which kind of tasks are supported by Xplique?¶
With the operator's api you can treat many different problems with Xplique. There is one operator for each task.
Task and Documentation link | operator parameter value from xplique.Tasks Enum |
Tutorial link |
---|---|---|
Classification | CLASSIFICATION |
|
Object Detection | OBJECT_DETECTION |
|
Regression | REGRESSION |
|
Semantic Segmentation | SEMANTIC_SEGMENTATION |
Info
They all share the API for Xplique attribution methods.
Simple example¶
import xplique
from xplique.utils_functions.segmentation import get_connected_zone
from xplique.attributions import Saliency
from xplique.metrics import Deletion
# load images and model
# ...
# extract targets individually
coordinates_of_object = (42, 42)
predictions = model(image)
target = get_connected_zone(predictions, coordinates_of_object)
inputs = tf.expand_dims(image, 0)
targets = tf.expand_dims(target, 0)
explainer = Saliency(model, operator=xplique.Tasks.SEMANTIC_SEGMENTATION)
explanations = explainer(inputs, targets)
metric = Deletion(model, inputs, targets, operator=xplique.Tasks.SEMANTIC_SEGMENTATION)
score_saliency = metric(explanations)
How to use it?¶
To apply attribution methods, the common API documentation describes the parameters and how to fix them. However, depending on the task and thus on the operator
, there are three points that vary:
-
The
operator
parameter value, it is an Enum or a string identifying the task, -
The model's output specification, as
model(inputs)
is used in the computation of the operators, and -
The
targets
parameter format, indeed, thetargets
parameter specifies what to explain and the format of such specification depends on the task.
Info
Applying attribution methods to semantic segmentation with Xplique has a particularity: a set of functions from utils_functions.segmentation
are used to define targets
and are documented in the a specific section.
The operator
¶
How to specify it¶
In Xplique, to adapt attribution methods, you should specify the task to the operator
parameter. In the case of semantic segmentation, with either:
Method(model, operator="semantic segmentation")
# or
Method(model, operator=xplique.Tasks.SEMANTIC_SEGMENTATION)
The computation¶
The operator for semantic segmentation is similar to the classification one, but the output is not a class but a matrix of class. The operator should take this position into account, thus it manipulates two elements:
-
The zone of interest: it represents the zone/pixels on which we want the explanation to be made. It could be a single object like a person, a group of objects like trees, a part of an object that has been wrongly classified, or even the border of an object. Note that the concept of object here only makes sense for us as the model only classifies pixels, which is why Xplique includes the segmentation utils function.
-
The class of interest: it represents the channel of the prediction we want to explain. Similarly to classification, we could either want to explain a cat or a dog in the same image. Note that in some case, providing several classes could make sense, see the example of applications with explanations of the borders between two objects.
Indeed, the semantic segmentation operator multiplies the model's predictions by the targets, which can be considered a mask. Then the operator divide the sum of the remaining predictions over the size of the mask. In some, the operator take the mean predictions over the zone and class of interest
Note that the two information need to be communicated through the targets
parameter.
The behavior¶
- In the case of perturbation-based methods, the perturbation score is the difference between the operator's output for the studied
inputs
and the perturbed inputs. Where the operator's output is the mean logits value over the class and zone of interest. - For gradient-based methods, the gradient of the mean of model's predictions limited to the zone and class of interest.
Model's output¶
We expect model(inputs)
to yield a \((n, h, w, c)\) tensor or array where:
- \(n\): the number of inputs, it should match the first dimension of
inputs
- \(h\): the height of the images
- \(w\): the width of the images
- \(c\): the number of classes
Warning
The model's output for each pixel is expected to be a soft output and not the class prediction or a one hot encoding of the class. Otherwise the attribution methods will not be able to compare predictions efficiently.
Warning
Contrary to classification, here a softmax or comparable last layer is necessary as zeros are interpreted by the operator as non-zone of interest. In this sense, strictly positive values are required.
The targets
parameter¶
Role¶
The targets
parameter specifies what is to explain in the inputs
, it is passed to the explain
or to the __call__
method of an explainer or metric and used by the operators. In the case of semantic segmentation, the targets
parameter enables the communication of the two necessary information for the semantic segmentation operator:
-
The zone of interest: to communicate the zone of interest via the
targets
parameter, thetargets
value on pixels that are not in the zone of interest should be set to zero. In this waytf.math.sign(targets)
creates a mask of the zone of interest. This operation should be done along the \(h\) and \(w\) dimensions oftargets
. -
The class of interest: similarly to the zone of interest, the class of interest is communicated by setting other classes along the \(c\) dimension to zero.
Format¶
The targets
parameter in the case of semantic segmentation should have the same shape as the model's output as a difference is made between the two. Hence, the shape is \((n, h, w,c )\) with:
- \(n\) is the number of inputs, it should match the first dimension of inputs
- \(h\) is the height of the images
- \(w\) is the width of the images
- \(c\) is the number of classes
Then it should take values in \(\{-1, 0, 1\}\), \(1\) in the zone of interest (zone on the \(h\) and \(w\) dimension) and \(0\) elsewhere. Similarly, values not on the channel corresponding to the class of interest (dimension \(c\)) should be \(0\). In the case of the explanation of a border or with contrastive explanations, \(-1\) values might be used.
In practice¶
The targets
parameter is computed via the xplique.utils_functions.segmentation set of functions. They manipulate model's prediction individually, as explanation requests are different between each image. Please refer to the segmentation utils functions for detail on how to design targets
.
Tip
You should not worry about such specification as the segmentation utils functions will do the work in your stead.
Warning
The targets
parameter for each sample should be defined individually. Then the batch dimension should be added manually or individual values should be stacked.
The segmentation utils functions¶
The segmentation utils functions are a set a utility functions used to compute the targets
parameter values. They should be applied to each image separately as each segmentation is different want the things to explain differs between images. Nonetheless, you could use tf.map_fn
to apply the same function to several images.
An example of application of those functions can be found in the Attribution: Semantic segmentation tutorial.
For now, there are four functions:
get_class_zone
¶
The most simple, where the class of interest is class_id
and the zone of interest corresponds to pixels where the class is the argmax along the classes dimension of the model's prediction. This function can be used to design targets
to explain:
- the class of a crowd of objects
- the class of an object, if there is only one object in the image.
- the class of a set of objects, if there are few and locally close objects of the same class.
get_class_zone(predictions: Union[tensorflow.python.framework.tensor.Tensor, ] ,
class_id: int) -> tensorflow.python.framework.tensor.Tensor
¶
class_id: int) -> tensorflow.python.framework.tensor.Tensor
Extract a mask for the class c
.
The mask correspond to the pixels where the maximum prediction correspond to the class c
.
Other classes channels are set to zero.
Parameters
-
predictions : Union[tensorflow.python.framework.tensor.Tensor,
] Output of the model, it should be the output of a softmax function.
We assume the shape (h, w, c).
-
class_id : int
Index of the channel of the class of interest.
Return
-
class_zone_mask : tensorflow.python.framework.tensor.Tensor
Mask of the zone corresponding to the class of interest.
Only the corresponding channel is non-zero.
The shape is the same as
predictions
, (h, w, c).
get_connected_zone
¶
Here coordinates
is a \((h, w)\) tuple that indicates the indices of a pixel of the image. The class of interest is the argmax along the classes dimension for this given pixel. Then the zone of interest is the set of pixels with the same argmax class that forms a connected zone with the indicated pixel. This function can be seen as selecting a zone with a point in this zone. This function can be used to design targets
to explain:
- the class of an object.
- the class of a set of objects, if they are connected.
- the class of part of an object, if this part have been classified differently than the object and the other surrounding objects.
get_connected_zone(predictions: Union[tensorflow.python.framework.tensor.Tensor, ] ,
coordinates: Tuple[int, int]) -> tensorflow.python.framework.tensor.Tensor
¶
coordinates: Tuple[int, int]) -> tensorflow.python.framework.tensor.Tensor
Extract a connected mask around coordinates
.
The mask correspond to the pixels where the maximum prediction correspond
to the maximum predicted class at coordinates
.
This class mask is then limited to the connected zone around coordinates
.
Other classes channels are set to zero.
Parameters
-
predictions : Union[tensorflow.python.framework.tensor.Tensor,
] Output of the model, it should be the output of a softmax function.
We assume the shape (h, w, c).
-
coordinates : Tuple[int, int]
Tuple of coordinates of the point inside the zone of interest.
Return
-
connected_zone_mask : tensorflow.python.framework.tensor.Tensor
Mask of the connected zone around
coordinates
with similar class prediction.Only the corresponding channel is non-zero.
The shape is the same as
predictions
, (h, w, c).
list_class_connected_zones
¶
A mix of get_class_zone
and get_connected_zone
. class_id
indicates the class of interest and each connected zone for this class becomes a zone of interest (apart from zones with size under zone_minimum_size
). It is useful for automatized treatment of explainability, but may generate explanations for zones we may not want to explain. Nonetheless, it can be used to design targets
to explain similar elements as get_connected_zone
.
Warning
Contrarily to the other utils function for segmentation, here output is a list of tensors.
list_class_connected_zones(predictions: Union[tensorflow.python.framework.tensor.Tensor, ] ,
class_id: int,
zone_minimum_size: int = 100) -> List[tensorflow.python.framework.tensor.Tensor]
¶
class_id: int,
zone_minimum_size: int = 100) -> List[tensorflow.python.framework.tensor.Tensor]
List all connected zones for a given class.
A connected zone is a set of pixels next to each others
where the maximum prediction correspond to the same class.
This function generate a list of connected zones,
each element of the list have a similar format to get_connected_zone
outputs.
Parameters
-
predictions : Union[tensorflow.python.framework.tensor.Tensor,
] Output of the model, it should be the output of a softmax function.
We assume the shape (h, w, c).
-
class_id : int
Index of the channel of the class of interest.
-
zone_minimum_size : int = 100
Threshold of number of pixels under which zones are not returned.
Return
-
connected_zones_masks_list : List[tensorflow.python.framework.tensor.Tensor]
List of the connected zones masks for a given class.
Each zone predictions shape is the same as
predictions
, (h, w, c).Only the corresponding channel is non-zero.
get_in_out_border
¶
This function allows to compute the targets
needed to explain the border of an object. For this function, class_target_mask
encodes the class and the zone of interest. From this zone, the in-border (all pixels of the zone with contact to non-zone pixels) and the out-border (all non-zone pixels with contact to pixels of the zone) are computed. Then, the in-borders pixels are set with the predictions values, and out-borders with the opposite of the predictions values. Therefore, explaining this border corresponds to explaining what increased the class predictions inside the zone and decreased it outside, but along the borders of the zone.
get_in_out_border(class_zone_mask: Union[tensorflow.python.framework.tensor.Tensor, ]) -> tensorflow.python.framework.tensor.Tensor
¶
Extract the border of a zone of interest, then put 1
on the
inside border and -1
on the outside border.
Parameters
-
class_zone_mask : Union[tensorflow.python.framework.tensor.Tensor,
] Mask delimiting the zone of interest, for the class of interest only one channel should have non-zero values, the one corresponding to the class.
We assume the shape (h, w, c) same as the model output for one element.
Return
-
class_borders_masks : tensorflow.python.framework.tensor.Tensor
Mask of the borders of the zone of the class of interest.
Only the corresponding channel is non-zero.
Inside borders are set to
1
and outside borders are set to-1
.The shape is the same as
class_zone_mask
, (h, w, c).
get_common_border
¶
This function uses two borders computed via the previous function and limits the zone of interest to the common part between both zone of interest. The classes of interest are merged, thus creating a second class of interest. Therefore, this function enables the creation of targets
to explain the border between two objects.
get_common_border(border_mask_1: Union[tensorflow.python.framework.tensor.Tensor, ] ,
border_mask_2: Union[tensorflow.python.framework.tensor.Tensor, ]) -> tensorflow.python.framework.tensor.Tensor
¶
border_mask_2: Union[tensorflow.python.framework.tensor.Tensor,
Compute the common part between border_mask_1
and border_mask_2
masks.
Those borders should be computed using get_in_out_border
.
Parameters
-
border_mask_1 : Union[tensorflow.python.framework.tensor.Tensor,
] Border of the first zone of interest. Computed with
get_in_out_border
.
-
border_mask_2 : Union[tensorflow.python.framework.tensor.Tensor,
] Border of the second zone of interest. Computed with
get_in_out_border
.
Return
-
common_borders_masks : tensorflow.python.framework.tensor.Tensor
Mask of the common borders between two zones of interest.
Only the two corresponding channels are non-zero.
Inside borders are set to
1
and outside borders are set to-1
, Respectively on the two channels.The shape is the same as the input border masks, (h, w, c).
What can be explained with it?¶
There are many things that we may want to explain in semantic segmentation, and in this section present different possibilities. The segmentation utils functions allow the design of the targets
parameter to specify what to explain.
Warning
The concept object does not make sense for the model, a semantic segmentation model only classifies pixels. However, what humans want to explain are mainly objects, sets of objects or parts of them.
Info
As objects do not make sense for the model, to stay coherent when manipulating objects. The only condition is that the predicted class on this connected zone is the same for all pixels.
For a concrete example, please refer to the Attributions: Semantic segmentation tutorial.
The class of an object¶
Here an object can be a person walking on a street, the dog by his side or a car.
However, what humans call an object does not make sense for model, hence explaining an object corresponds to explaining a zone of interest where pixels have the same classification.
Warning
The zone should be extracted from the model's prediction and not the labels.
To explain the difference between labels and predictions there are two possibilities:
- either the difference is a single zone with a different class than the surroundings, then this zone can be considered an object.
- or the difference is more complex or mixed with other objects. Then the zones in the union but not in the intersection of both should be iteratively considered objects and explained. It is not recommended to treat them simultaneously.
The class of a set of objects¶
A set of objects can be a group of people walking down a street or a set of trees on one side of the road.
There are three cases that can be considered set of objects:
- Connected set of objects, it can be seen as only one big zone and treated the same as in 1.
- Locally close set of objects, this could also considered a big zone, but it is harder to compute.
- Set of objects dispersed on the image and hardly countable, if there are a multitude of objects then, it can be seen as a crowd of objects. Otherwise, it should not be treated together.
The class of part of an object¶
A part of an object can be the leg of a person, the head of a dog, or a person in a group of people. This is interesting when the part and the object have been classified differently by the model. It should be considered an object as in 1.
The class of a crowd of objects¶
A crowd is a set of hardly countable objects, it can be a set of clouds, a multitude of people on the sidewalk or trees in a landscape.
The border of an object¶
The border of an object is the limit between the pixels inside the object and those outside of it. Here the object should correspond to a connected zone of pixels where the model predicts the same class.
It can be the contour of three people on the side walk or of trees on a landscape. It is interesting when the border is hard to define between similarly colored pixels or when the model prediction is not precise.
The border between two objects¶
The border between two objects is the common part between two borders of objects when those two are connected. This can be the border between a person and his wrongly classified leg.
Binary semantic segmentation¶
As described in the operator description, the output of the model should have a shape of \((n, h, w, c)\). However, in binary semantic segmentation, the two classes are often encoded by positive and negative value along only one channel with shape \((n, h, w)\).
The easiest way to apply xplique on such model is to wrap the model to match the expected format. If we suppose that the output of the binary semantic segmentation model have a shape of \((n, h, w)\), that negative values encode class \(0\), and that positive values encode class \(1\). Then the wrapper can take the form:
class Wrapper():
def __init__(model):
self.model = model
def __call__(inputs):
binary_segmentation = self.model(inputs)
class_0_mask = binary_segmentation < 0
divided = tf.stack([-binary_segmentation * tf.cast(class_0_mask, tf.float32),
binary_segmentation * tf.cast(tf.logical_not(class_0_mask), tf.float32)],
axis=-1)
return tf.nn.softmax(divided, axis=-1)
wrapped_model = wrap(binary_seg_model)