CAV¶

CAV or Concept Activation Vector represent a high-level concept as a vector that indicate the direction to take (for activations of a layer) to maximise this concept.

Quote

[...] CAV for a concept is simply a vector in the direction of the values (e.g., activations) of that concept’s set of examples… we derive CAVs by training a linear classifier between a concept’s examples and random counter examples and then taking the vector orthogonal to the decision boundary.

-- Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) (2018).¹

For a layer \(f_l\) of a model, we seek the linear classifier \(v_l \in \mathbb{R}^d\) that separate the activations of the positive examples \(\{ f_l(x) : x \in \mathcal{P} \}\), and the activations of the random/negative examples \(\{ f_l(x) : x \in \mathcal{R} \}\).

Example¶

from xplique.concepts import Cav

cav_renderer = Cav(model, 'mixed4d', classifier='SGD', test_fraction=0.1)
cav = cav_renderer(positive_examples, random_examples)

`Cav`¶

Used to compute the Concept Activation Vector, which is a vector in the direction of the activations of that concept’s set of examples.

`init(self, model: keras.src.engine.training.Model, target_layer: Union[str, int], classifier: Union[str, Callable] = 'SGD', test_fraction: float = 0.2, batch_size: int = 64, verbose: bool = False)`¶

Parameters

model : keras.src.engine.training.Model
- Model to extract concept from.
target_layer : Union[str, int]
- Index of the target layer or name of the layer.
classifier : 'SGD' or 'SVC' or Sklearn model, optional
- Default implementation use SGD with hinge classifier (linear SVM), SVC use libsvm but the computation time is longer.
test_fraction : float = 0.2
- Fraction of the dataset used for test
batch_size : int = 64
- Batch size during the activations extraction
verbose : bool = False
- If true, display information while training the classifier

`fit(self, positive_dataset: tf.Tensor, negative_dataset: tf.Tensor) -> tf.Tensor`¶

Compute and return the Concept Activation Vector (CAV) associated to the dataset and the layer targeted.

Parameters

positive_dataset : tf.Tensor
- Dataset of positive samples : samples containing the concept.
negative_dataset : tf.Tensor
- Dataset of negative samples : samples without the concept

Return

cav : tf.Tensor
- Vector of the same shape as the layer output

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) (2018). ↩

CAV¶

Example¶

Cav¶

__init__(self, model: keras.src.engine.training.Model, target_layer: Union[str, int], classifier: Union[str, Callable] = 'SGD', test_fraction: float = 0.2, batch_size: int = 64, verbose: bool = False)¶

fit(self, positive_dataset: tf.Tensor, negative_dataset: tf.Tensor) -> tf.Tensor¶

`Cav`¶

`init(self, model: keras.src.engine.training.Model, target_layer: Union[str, int], classifier: Union[str, Callable] = 'SGD', test_fraction: float = 0.2, batch_size: int = 64, verbose: bool = False)`¶

`fit(self, positive_dataset: tf.Tensor, negative_dataset: tf.Tensor) -> tf.Tensor`¶