TCAV¶

TCAV or Testing with Concept Activation Vector consist consists in using a concept activation vector (CAV) to quantify the relationship between this concept and a class.

This is done by using the directional derivative of the concept vector on several samples of a given class and measuring the percentage of positive (a positive directional derivative indicating that an infinitesimal addition of the concept increases the probability of the class).

For a Concept Activation Vector \(v_l\) of a layer \(f_l\) of a model, and \(f_{c}\) the logit of the class \(c\), we measure the directional derivative \(S_c(x) = v_l \cdot \frac{ \partial{f_c(x)} } { \partial{f_l}(x) }\).

The TCAV score is the percentage of elements of the class \(c\) for which the \(S_c\) is positive.

\[ TCAV_c = \frac{|x \in \mathcal{X}^c : S_c(x) > 0 |}{ | \mathcal{X}^c | } \]

Example¶

from xplique.concepts import Tcav

tcav_renderer = Tcav(model, 'mixed4d') # you can also pass the layer index (e.g -1)
tcav_score = tcav_renderer(samples, class_index, cav)

`Tcav`¶

Used to Test a Concept Activation Vector, using the sign of the directional derivative of a concept vector relative to a class.

`init(self, model: keras.src.engine.training.Model, target_layer: Union[str, int], batch_size: Optional[int] = 64)`¶

Parameters

model : keras.src.engine.training.Model
- Model to extract concept from.
target_layer : Union[str, int]
- Index of the target layer or name of the layer.
batch_size : Optional[int] = 64
- Batch size during the predictions.

`directional_derivative(multi_head_model: keras.src.engine.training.Model, inputs: tensorflow.python.framework.tensor.Tensor, label: int, cav: tensorflow.python.framework.tensor.Tensor) -> tensorflow.python.framework.tensor.Tensor`¶

Compute the gradient of the label relative to the activations of the CAV layer.

Parameters

multi_head_model : keras.src.engine.training.Model
- Model reconfigured, first output is the activations of the CAV layer, and the second output is the prediction layer.
inputs : tensorflow.python.framework.tensor.Tensor
- Input sample on which to test the influence of the concept.
label : int
- Index of the class to test.
cav : tensorflow.python.framework.tensor.Tensor
- Concept Activation Vector, same shape as the activations output.

Return

directional_derivative : tensorflow.python.framework.tensor.Tensor
- Directional derivative values of each samples.

`score(self, inputs: tensorflow.python.framework.tensor.Tensor, label: int, cav: tensorflow.python.framework.tensor.Tensor) -> float`¶

Compute and return the TCAV score of the CAV associated to class tested.

Parameters

inputs : tensorflow.python.framework.tensor.Tensor
- Input sample on which to test the influence of the concept.
label : int
- Index of the class to test.
cav : tensorflow.python.framework.tensor.Tensor
- Concept Activation Vector, see CAV module.

Return

tcav : float
- Percentage of sample for which increasing the concept has a positive impact on the class logit.

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) (2018). ↩

TCAV¶

Example¶

Tcav¶

__init__(self, model: keras.src.engine.training.Model, target_layer: Union[str, int], batch_size: Optional[int] = 64)¶

directional_derivative(multi_head_model: keras.src.engine.training.Model, inputs: tensorflow.python.framework.tensor.Tensor, label: int, cav: tensorflow.python.framework.tensor.Tensor) -> tensorflow.python.framework.tensor.Tensor¶

score(self, inputs: tensorflow.python.framework.tensor.Tensor, label: int, cav: tensorflow.python.framework.tensor.Tensor) -> float¶

`Tcav`¶

`init(self, model: keras.src.engine.training.Model, target_layer: Union[str, int], batch_size: Optional[int] = 64)`¶

`directional_derivative(multi_head_model: keras.src.engine.training.Model, inputs: tensorflow.python.framework.tensor.Tensor, label: int, cav: tensorflow.python.framework.tensor.Tensor) -> tensorflow.python.framework.tensor.Tensor`¶

`score(self, inputs: tensorflow.python.framework.tensor.Tensor, label: int, cav: tensorflow.python.framework.tensor.Tensor) -> float`¶