CAV¶
CAV or Concept Activation Vector represent a high-level concept as a vector that indicate the direction to take (for activations of a layer) to maximise this concept.
Quote
[...] CAV for a concept is simply a vector in the direction of the values (e.g., activations) of that concept’s set of examples… we derive CAVs by training a linear classifier between a concept’s examples and random counter examples and then taking the vector orthogonal to the decision boundary.
-- Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) (2018).1
For a layer \(f_l\) of a model, we seek the linear classifier \(v_l \in \mathbb{R}^d\) that separate the activations of the positive examples \(\{ f_l(x) : x \in \mathcal{P} \}\), and the activations of the random/negative examples \(\{ f_l(x) : x \in \mathcal{R} \}\).
Example¶
from xplique.concepts import Cav
cav_renderer = Cav(model, 'mixed4d', classifier='SGD', test_fraction=0.1)
cav = cav_renderer(positive_examples, random_examples)
Cav
¶
Used to compute the Concept Activation Vector, which is a vector in the direction of the
activations of that concept’s set of examples.
__init__(self,
model: keras.engine.training.Model,
target_layer: Union[str, int],
classifier: Union[str, Callable] = 'SGD',
test_fraction: float = 0.2,
batch_size: int = 64,
verbose: bool = False)
¶
model: keras.engine.training.Model,
target_layer: Union[str, int],
classifier: Union[str, Callable] = 'SGD',
test_fraction: float = 0.2,
batch_size: int = 64,
verbose: bool = False)
Parameters
-
model : keras.engine.training.Model
Model to extract concept from.
-
target_layer : Union[str, int]
Index of the target layer or name of the layer.
-
classifier : 'SGD' or 'SVC' or Sklearn model, optional
Default implementation use SGD with hinge classifier (linear SVM), SVC use libsvm but the computation time is longer.
-
test_fraction : float = 0.2
Fraction of the dataset used for test
-
batch_size : int = 64
Batch size during the activations extraction
-
verbose : bool = False
If true, display information while training the classifier
fit(self,
positive_dataset: tf.Tensor,
negative_dataset: tf.Tensor) -> tf.Tensor
¶
positive_dataset: tf.Tensor,
negative_dataset: tf.Tensor) -> tf.Tensor
Compute and return the Concept Activation Vector (CAV) associated to the dataset and the
layer targeted.
Parameters
-
positive_dataset : tf.Tensor
Dataset of positive samples : samples containing the concept.
-
negative_dataset : tf.Tensor
Dataset of negative samples : samples without the concept
Return
-
cav : tf.Tensor
Vector of the same shape as the layer output