Kernel Shap¶

View colab tutorial | View source

By setting appropriately the perturbation function, the similarity kernel and the interpretable model in the LIME framework we can theoretically obtain the Shapley Values more efficiently. Therefore, KernelShap is a method based on LIME with specific attributes.

Quote

The exact computation of SHAP values is challenging. However, by combining insights from current additive feature attribution methods, we can approximate them. We describe two model-agnostic approximation methods, [...] and another that is novel (Kernel SHAP)

-- A Unified Approach to Interpreting Model Predictions¹

Example¶

from xplique.attributions import KernelShap

# load images, labels and model
# define a custom map_to_interpret_space function
# ...

method = KernelShap(model, map_to_interpret_space=custom_map)
explanations = method.explain(images, labels)

The choice of the map function will have a great deal toward the quality of explanation. By default, the map function use the quickshift segmentation of scikit-images

Notebooks¶

`KernelShap`¶

By setting appropriately the perturbation function, the similarity kernel and the interpretable model in the LIME framework we can theoretically obtain the Shapley Values more efficiently. Therefore, KernelShap is a method based on LIME with specific attributes.

`init(self, model: Callable, batch_size: int = 64, operator: Union[xplique.commons.operators.Tasks, str, Callable[[keras.src.engine.training.Model, tf.Tensor, tf.Tensor], float], None] = None, map_to_interpret_space: Optional[Callable] = None, nb_samples: int = 800, ref_value: Optional[numpy.ndarray] = None)`¶

Parameters

model : Callable
- The model from which we want to obtain explanations.
batch_size : int = 64
- Number of perturbed samples to process at once, mandatory when nb_samples is huge.
  Notice, it is different compare to WhiteBox explainers which batch the inputs.
  Here inputs are process one by one.
operator : Union[xplique.commons.operators.Tasks, str, Callable[[keras.src.engine.training.Model, tf.Tensor, tf.Tensor], float], None] = None
- Function g to explain, g take 3 parameters (f, x, y) and should return a scalar, with f the model, x the inputs and y the targets. If None, use the standard operator g(f, x, y) = f(x)[y].
map_to_interpret_space : Optional[Callable] = None
- Function which group features of an input corresponding to the same interpretable feature (e.g super-pixel).
  It allows to transpose from (resp. to) the original input space to (resp. from) the interpretable space.
nb_samples : int = 800
- The number of perturbed samples you want to generate for each input sample.
  Default to 800.
ref_value : Optional[numpy.ndarray] = None
- It defines reference value which replaces each feature when the corresponding interpretable feature is set to 0.
  It should be provided as: a ndarray of shape (1) if there is no channels in your input and (C,) otherwise.
  The default ref value is set to (0.5,0.5,0.5) for inputs with 3 channels (corresponding to a grey pixel when inputs are normalized by 255) and to 0 otherwise.

`explain(self, inputs: Union[tf.Dataset, tf.Tensor, numpy.ndarray], targets: Union[tf.Tensor, numpy.ndarray, None] = None) -> tf.Tensor`¶

This method attributes the output of the model with given targets to the inputs of the model using the approach described above, training an interpretable model and returning a representation of the interpretable model.

Parameters

inputs : Union[tf.Dataset, tf.Tensor, numpy.ndarray]
- Dataset, Tensor or Array. Input samples to be explained.
  If Dataset, targets should not be provided (included in Dataset).
  Expected shape among (N, W), (N, T, W), (N, H, W, C).
  More information in the documentation.
targets : Union[tf.Tensor, numpy.ndarray, None] = None
- Tensor or Array. One-hot encoding of the model's output from which an explanation is desired. One encoding per input and only one output at a time. Therefore, the expected shape is (N, output_size).
  More information in the documentation.

Return

explanations : tf.Tensor
- Interpretable coefficients, same shape as the inputs, except for the channels.
  Coefficients of the interpretable model. Those coefficients having the size of the interpretable space will be given the same value to coefficient which were grouped together (e.g belonging to the same super-pixel).

Warning

The computation time might be very long depending on the hyperparameters settings. A huge number of perturbed samples and a fine-grained mapping may lead to better results but it is long to compute.

Parameters in-depth¶

`map_to_interpret_space`:¶

Function which group features of an input corresponding to the same interpretable feature (e.g super-pixel).

It allows to transpose from (resp. to) the original input space to (resp. from) the interpretable space.

The default mappings are:

- the quickshift segmentation algorithm for inputs with \((N, W, H, C)\) shape, we assume here such shape is used to represent \((W, H, C)\) images.
- the felzenszwalb segmentation algorithm for inputs with \((N, W, H)\) shape, we assume here such shape is used to represent \((W, H)\) images.
- an identity mapping if inputs has shape \((N, W)\), we assume here your inputs are tabular data.

To use your own custom map function you should use the following scheme:

def custom_map_to_interpret_space(inputs: tf.tensor) ->
tf.tensor:
    **some grouping techniques**
    return mappings

mappings should have the same dimension as input except for channels.

For instance you can use the scikit-image (as we did for the quickshift algorithm) library to defines super pixels on your images.

Info

The quality of your explanation relies strongly on this mapping.

Warning

Depending on the mapping you might have a huge number of interpretable_features (e.g you map pixels 2 by 2 on a 299x299 image). Thus, the compuation time might be very long!

Danger

As you may have noticed, by default Time Series are not handled. Consequently, a custom mapping should be implented. Either to assign each feature to a different group or to group consecutive features together, by group of 4 timesteps for example. In the second example, we try to cover patterns. An example is provided below.

def map_time_series(single_input: tf.tensor) -> tf.Tensor:
    time_dim = single_input.shape[0]
    feat_dim = single_input.shape[1]
    mapping = tf.range(time_dim*feat_dim)
    mapping = tf.reshape(mapping, (time_dim, feat_dim))
    return mapping

A Unified Approach to Interpreting Model Predictions ↩

Kernel Shap¶

Example¶

Notebooks¶

KernelShap¶

explain(self, inputs: Union[tf.Dataset, tf.Tensor, numpy.ndarray], targets: Union[tf.Tensor, numpy.ndarray, None] = None) -> tf.Tensor¶

Parameters in-depth¶

map_to_interpret_space:¶

`KernelShap`¶

`explain(self, inputs: Union[tf.Dataset, tf.Tensor, numpy.ndarray], targets: Union[tf.Tensor, numpy.ndarray, None] = None) -> tf.Tensor`¶

`map_to_interpret_space`:¶