TCAV¶
TCAV or Testing with Concept Activation Vector consist consists in using a concept activation vector (CAV) to quantify the relationship between this concept and a class.
This is done by using the directional derivative of the concept vector on several samples of a given class and measuring the percentage of positive (a positive directional derivative indicating that an infinitesimal addition of the concept increases the probability of the class).
For a Concept Activation Vector \(v_l\) of a layer \(f_l\) of a model, and \(f_{c}\) the logit of the class \(c\), we measure the directional derivative \(S_c(x) = v_l \cdot \frac{ \partial{f_c(x)} } { \partial{f_l}(x) }\).
The TCAV score is the percentage of elements of the class \(c\) for which the \(S_c\) is positive.
Example¶
from xplique.concepts import Tcav
tcav_renderer = Tcav(model, 'mixed4d') # you can also pass the layer index (e.g -1)
tcav_score = tcav_renderer(samples, class_index, cav)
Tcav
¶
Used to Test a Concept Activation Vector, using the sign of the directional derivative of a
concept vector relative to a class.
__init__(self,
model: keras.src.engine.training.Model,
target_layer: Union[str, int],
batch_size: Optional[int] = 64)
¶
model: keras.src.engine.training.Model,
target_layer: Union[str, int],
batch_size: Optional[int] = 64)
Parameters
-
model : keras.src.engine.training.Model
Model to extract concept from.
-
target_layer : Union[str, int]
Index of the target layer or name of the layer.
-
batch_size : Optional[int] = 64
Batch size during the predictions.
directional_derivative(multi_head_model: keras.src.engine.training.Model,
inputs: tensorflow.python.framework.tensor.Tensor,
label: int,
cav: tensorflow.python.framework.tensor.Tensor) -> tensorflow.python.framework.tensor.Tensor
¶
inputs: tensorflow.python.framework.tensor.Tensor,
label: int,
cav: tensorflow.python.framework.tensor.Tensor) -> tensorflow.python.framework.tensor.Tensor
Compute the gradient of the label relative to the activations of the CAV layer.
Parameters
-
multi_head_model : keras.src.engine.training.Model
Model reconfigured, first output is the activations of the CAV layer, and the second output is the prediction layer.
-
inputs : tensorflow.python.framework.tensor.Tensor
Input sample on which to test the influence of the concept.
-
label : int
Index of the class to test.
-
cav : tensorflow.python.framework.tensor.Tensor
Concept Activation Vector, same shape as the activations output.
Return
-
directional_derivative : tensorflow.python.framework.tensor.Tensor
Directional derivative values of each samples.
score(self,
inputs: tensorflow.python.framework.tensor.Tensor,
label: int,
cav: tensorflow.python.framework.tensor.Tensor) -> float
¶
inputs: tensorflow.python.framework.tensor.Tensor,
label: int,
cav: tensorflow.python.framework.tensor.Tensor) -> float
Compute and return the TCAV score of the CAV associated to class tested.
Parameters
-
inputs : tensorflow.python.framework.tensor.Tensor
Input sample on which to test the influence of the concept.
-
label : int
Index of the class to test.
-
cav : tensorflow.python.framework.tensor.Tensor
Concept Activation Vector, see CAV module.
Return
-
tcav : float
Percentage of sample for which increasing the concept has a positive impact on the class logit.