Skip to content

Average Drop

Average Drop (AD) measures the relative decrease in the model's confidence when the input is masked according to the explanation. A good explanation should identify features that, when masked, cause the model's score to drop significantly.

Quote

We define Average Drop % as the percentage of positive drops averaged over all images.

-- Grad-CAM++: Generalized Gradient-based Visual Explanations for Deep Convolutional Networks (2018)1

Formally, for each sample \(i\):

\[ AD_i = \frac{\text{ReLU}(base_i - after_i)}{base_i + \epsilon} \]

Where: - \(base_i = g(f, x_i, y_i)\) is the model score on the original input - \(after_i = g(f, x_i \odot M_i, y_i)\) is the model score after masking with explanation-derived mask \(M_i\)

Info

The better the explanation, the lower the Average Drop score. A low score indicates that the explanation correctly identifies important features.

Score Interpretation

  • Lower scores are better: A low Average Drop means that important features (as identified by the explanation) are correctly captured. When these features are preserved (via masking), the model's confidence doesn't drop much.
  • The ReLU ensures we only measure drops, not increases in confidence.
  • The normalization by \(base_i\) makes the metric scale-invariant.

Example

from xplique.metrics import AverageDropMetric
from xplique.attributions import Saliency

# load images, labels and model
# ...
explainer = Saliency(model)
explanations = explainer(inputs, labels)

metric = AverageDropMetric(model, inputs, labels, activation="softmax")
score = metric.evaluate(explanations)

Tip

This metric works best with probabilistic outputs. Set activation="softmax" or "sigmoid" if your model returns logits.

AverageDropMetric

Average Drop (AD) — measures relative decrease in the model score when the input is masked by the explanation (lower AD is better).

__init__(self,
         model: keras.src.engine.training.Model,
         inputs: Union[tf.Dataset, tensorflow.python.framework.tensor.Tensor, numpy.ndarray],
         targets: Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray, None] = None,
         batch_size: Optional[int] = 64,
         operator: Union[str, Callable, None] = None,
         activation: Optional[str] = None)

Parameters

  • model : keras.src.engine.training.Model

    • Model used for computing metric.

  • inputs : Union[tf.Dataset, tensorflow.python.framework.tensor.Tensor, numpy.ndarray]

    • Input samples under study.

  • targets : Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray, None] = None

    • One-hot encoded labels or regression target (e.g {+1, -1}), one for each sample.

  • batch_size : Optional[int] = 64

    • Number of samples to process at once, if None compute all at once.

  • operator : Union[str, Callable, None] = None

    • Function g to explain. It should take 3 parameters (f, x, y) and return a scalar, with f the model, x the inputs and y the targets. If None, use the standard operator g(f, x, y) = f(x)[y].

  • activation : Optional[str] = None

    • A string that belongs to [None, 'sigmoid', 'softmax']. Specify if we should add an activation layer once the model has been called.

detailed_evaluate(self,
                  inputs: tensorflow.python.framework.tensor.Tensor,
                  targets: tensorflow.python.framework.tensor.Tensor,
                  explanations: tensorflow.python.framework.tensor.Tensor) -> numpy.ndarray

Compute Average Drop scores for a batch of samples.

Parameters

  • inputs : tf.Tensor

    • Batch of input samples. Shape: (B, H, W, C) for images, (B, T, F) for time series, or (B, ...) for other data types.

  • targets : tf.Tensor

    • Batch of target labels. Shape: (B, num_classes) for one-hot encoded, or (B,) for class indices/regression targets.

  • explanations : tf.Tensor

    • Batch of attribution maps. Shape must be compatible with inputs (same spatial/temporal dimensions, optionally without channel dimension).

Return

  • scores : np.ndarray

    • Per-sample Average Drop scores, shape (B,).

      Values range from 0 (no drop or increase) to ~1 (complete drop).

      Lower values indicate better explanations.


evaluate(self,
         explanations: Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray]) -> float

Evaluate the metric over the entire dataset by iterating over batches.