Skip to content

CAV

CAV or Concept Activation Vector represent a high-level concept as a vector that indicate the direction to take (for activations of a layer) to maximise this concept.

Quote

[...] CAV for a concept is simply a vector in the direction of the values (e.g., activations) of that concept’s set of examples… we derive CAVs by training a linear classifier between a concept’s examples and random counter examples and then taking the vector orthogonal to the decision boundary.

-- Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) (2018).1

For a layer \(f_l\) of a model, we seek the linear classifier \(v_l \in \mathbb{R}^d\) that separate the activations of the positive examples \(\{ f_l(x) : x \in \mathcal{P} \}\), and the activations of the random/negative examples \(\{ f_l(x) : x \in \mathcal{R} \}\).

Example

from xplique.concepts import Cav

cav_renderer = Cav(model, 'mixed4d', classifier='SGD', test_fraction=0.1)
cav = cav_renderer(positive_examples, random_examples)

Cav

Used to compute the Concept Activation Vector, which is a vector in the direction of the activations of that concept’s set of examples.

__init__(self,
         model: keras.src.engine.training.Model,
         target_layer: Union[str, int],
         classifier: Union[str, Callable] = 'SGD',
         test_fraction: float = 0.2,
         batch_size: int = 64,
         verbose: bool = False)

Parameters

  • model : keras.src.engine.training.Model

    • Model to extract concept from.

  • target_layer : Union[str, int]

    • Index of the target layer or name of the layer.

  • classifier : 'SGD' or 'SVC' or Sklearn model, optional

    • Default implementation use SGD with hinge classifier (linear SVM), SVC use libsvm but the computation time is longer.

  • test_fraction : float = 0.2

    • Fraction of the dataset used for test

  • batch_size : int = 64

    • Batch size during the activations extraction

  • verbose : bool = False

    • If true, display information while training the classifier

fit(self,
    positive_dataset: tf.Tensor,
    negative_dataset: tf.Tensor) -> tf.Tensor

Compute and return the Concept Activation Vector (CAV) associated to the dataset and the layer targeted.

Parameters

  • positive_dataset : tf.Tensor

    • Dataset of positive samples : samples containing the concept.

  • negative_dataset : tf.Tensor

    • Dataset of negative samples : samples without the concept

Return

  • cav : tf.Tensor

    • Vector of the same shape as the layer output