Skip to content

TCAV

TCAV or Testing with Concept Activation Vector consist consists in using a concept activation vector (CAV) to quantify the relationship between this concept and a class.

This is done by using the directional derivative of the concept vector on several samples of a given class and measuring the percentage of positive (a positive directional derivative indicating that an infinitesimal addition of the concept increases the probability of the class).

For a Concept Activation Vector \(v_l\) of a layer \(f_l\) of a model, and \(f_{c}\) the logit of the class \(c\), we measure the directional derivative \(S_c(x) = v_l \cdot \frac{ \partial{f_c(x)} } { \partial{f_l}(x) }\).

The TCAV score is the percentage of elements of the class \(c\) for which the \(S_c\) is positive.

\[ TCAV_c = \frac{|x \in \mathcal{X}^c : S_c(x) > 0 |}{ | \mathcal{X}^c | } \]

Example

from xplique.concepts import Tcav

tcav_renderer = Tcav(model, 'mixed4d') # you can also pass the layer index (e.g -1)
tcav_score = tcav_renderer(samples, class_index, cav)

Tcav

Used to Test a Concept Activation Vector, using the sign of the directional derivative of a concept vector relative to a class.

__init__(self,
         model: keras.engine.training.Model,
         target_layer: Union[str, int],
         batch_size: Optional[int] = 64)

Parameters

  • model : keras.engine.training.Model

    • Model to extract concept from.

  • target_layer : Union[str, int]

    • Index of the target layer or name of the layer.

  • batch_size : Optional[int] = 64

    • Batch size during the predictions.

directional_derivative(multi_head_model: keras.engine.training.Model,
                       inputs: tf.Tensor,
                       label: int,
                       cav: tf.Tensor) -> tf.Tensor

Compute the gradient of the label relative to the activations of the CAV layer.

Parameters

  • multi_head_model : keras.engine.training.Model

    • Model reconfigured, first output is the activations of the CAV layer, and the second output is the prediction layer.

  • inputs : tf.Tensor

    • Input sample on which to test the influence of the concept.

  • label : int

    • Index of the class to test.

  • cav : tf.Tensor

    • Concept Activation Vector, same shape as the activations output.

Return

  • directional_derivative : tf.Tensor

    • Directional derivative values of each samples.


score(self,
      inputs: tf.Tensor,
      label: int,
      cav: tf.Tensor) -> float

Compute and return the TCAV score of the CAV associated to class tested.

Parameters

  • inputs : tf.Tensor

    • Input sample on which to test the influence of the concept.

  • label : int

    • Index of the class to test.

  • cav : tf.Tensor

    • Concept Activation Vector, see CAV module.

Return

  • tcav : float

    • Percentage of sample for which increasing the concept has a positive impact on the class logit.