Average Gain¶
Average Gain (AG) measures the relative increase in the model's confidence when the input is masked according to the explanation. This metric is complementary to Average Drop and evaluates whether explanations capture truly discriminative features.
Quote
We define Opti-CAM, a saliency method that obtains the saliency map by directly optimizing for the Average Gain metric.
-- Opti-CAM: Optimizing saliency maps for interpretability (2024)1
Formally, for each sample \(i\):
Where: - \(base_i = g(f, x_i, y_i)\) is the model score on the original input - \(after_i = g(f, x_i \odot M_i, y_i)\) is the model score after masking with explanation-derived mask \(M_i\)
Info
The better the explanation, the higher the Average Gain score. A high score indicates that isolated important features are sufficient to maintain or increase the model's confidence.
Score Interpretation¶
- Higher scores are better: A high Average Gain means that the explanation successfully identifies features which, when isolated, are sufficient to maintain or increase the model's confidence.
- The ReLU ensures we only measure gains, not decreases in confidence.
- The normalization by \((1 - base_i)\) accounts for the remaining headroom to achieve a perfect score.
Example¶
from xplique.metrics import AverageGainMetric
from xplique.attributions import Saliency
# load images, labels and model
# ...
explainer = Saliency(model)
explanations = explainer(inputs, labels)
metric = AverageGainMetric(model, inputs, labels, activation="softmax")
score = metric.evaluate(explanations)
Tip
This metric is intended for scores in [0, 1]. If your model outputs logits, use activation="softmax" or "sigmoid" at construction to operate on probabilities.
AverageGainMetric¶
Average Gain (AG) — normalized relative increase when the input is masked
by the explanation (higher AG is better, complementary to AD).
__init__(self,
model: keras.src.engine.training.Model,
inputs: Union[tf.Dataset, tensorflow.python.framework.tensor.Tensor, numpy.ndarray],
targets: Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray, None] = None,
batch_size: Optional[int] = 64,
operator: Union[str, Callable, None] = None,
activation: Optional[str] = None)¶
model: keras.src.engine.training.Model,
inputs: Union[tf.Dataset, tensorflow.python.framework.tensor.Tensor, numpy.ndarray],
targets: Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray, None] = None,
batch_size: Optional[int] = 64,
operator: Union[str, Callable, None] = None,
activation: Optional[str] = None)
Parameters
-
model : keras.src.engine.training.Model
Model used for computing metric.
-
inputs : Union[tf.Dataset, tensorflow.python.framework.tensor.Tensor, numpy.ndarray]
Input samples under study.
-
targets : Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray, None] = None
One-hot encoded labels or regression target (e.g {+1, -1}), one for each sample.
-
batch_size : Optional[int] = 64
Number of samples to process at once, if None compute all at once.
-
operator : Union[str, Callable, None] = None
Function g to explain. It should take 3 parameters (f, x, y) and return a scalar, with f the model, x the inputs and y the targets. If None, use the standard operator g(f, x, y) = f(x)[y].
-
activation : Optional[str] = None
A string that belongs to [None, 'sigmoid', 'softmax']. Specify if we should add an activation layer once the model has been called.
detailed_evaluate(self,
inputs: tensorflow.python.framework.tensor.Tensor,
targets: tensorflow.python.framework.tensor.Tensor,
explanations: tensorflow.python.framework.tensor.Tensor) -> numpy.ndarray¶
inputs: tensorflow.python.framework.tensor.Tensor,
targets: tensorflow.python.framework.tensor.Tensor,
explanations: tensorflow.python.framework.tensor.Tensor) -> numpy.ndarray
Compute Average Gain scores for a batch of samples.
Parameters
-
inputs : tf.Tensor
Batch of input samples. Shape: (B, H, W, C) for images, (B, T, F) for time series, or (B, ...) for other data types.
-
targets : tf.Tensor
Batch of target labels. Shape: (B, num_classes) for one-hot encoded, or (B,) for class indices/regression targets.
-
explanations : tf.Tensor
Batch of attribution maps. Shape must be compatible with inputs (same spatial/temporal dimensions, optionally without channel dimension).
Return
-
scores : np.ndarray
Per-sample Average Gain scores, shape (B,).
Values range from 0 (no gain or decrease) to potentially > 1.
Higher values indicate better explanations.
evaluate(self,
explanations: Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray]) -> float¶
explanations: Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray]) -> float
Evaluate the metric over the entire dataset by iterating over batches.