Model Randomization Metric¶
Model Randomization metric tests whether explanations degrade when model parameters are randomized. This implements a sanity check to verify that explainers are sensitive to model parameters.
Quote
We propose sanity checks for saliency methods. [...] We find that some widely deployed saliency methods are independent of both the data the model was trained on, and the model parameters.
For each sample \((x, y)\):
- Compute explanation under the original model
- Randomize model parameters according to a strategy
- Compute explanation under the randomized model
- Measure Spearman rank correlation between both explanations
Info
A low Spearman correlation indicates that explanations are sensitive to the model parameters (desirable for a faithful explainer).
Score Interpretation¶
- Lower scores are better: A low correlation means the explanations change significantly when model parameters are randomized, indicating the explainer properly depends on the learned weights.
- Values range from -1 to 1, where 1 means perfectly correlated explanations.
- High correlation values suggest the explainer may not be faithfully using model information.
Randomization Strategies¶
The metric supports different randomization strategies via ModelRandomizationStrategy:
ProgressiveLayerRandomization¶
Randomizes model weights layer-by-layer, starting from the output layers (by default) and progressing toward the input layers.
from xplique.metrics import ProgressiveLayerRandomization
# Randomize top 25% of layers (default)
strategy = ProgressiveLayerRandomization(stop_layer=0.25)
# Randomize up to a specific layer
strategy = ProgressiveLayerRandomization(stop_layer='conv2')
# Randomize from input toward output
strategy = ProgressiveLayerRandomization(stop_layer=3, reverse=False)
Example¶
from xplique.metrics import ModelRandomizationMetric, ProgressiveLayerRandomization
from xplique.attributions import Saliency
# load images, labels and model
# ...
explainer = Saliency(model)
# Use default strategy (randomize top 25% of layers)
metric = ModelRandomizationMetric(model, inputs, labels)
score = metric.evaluate(explainer)
# Or specify a custom strategy
strategy = ProgressiveLayerRandomization(stop_layer=0.5)
metric = ModelRandomizationMetric(model, inputs, labels, randomization_strategy=strategy)
score = metric.evaluate(explainer)
Warning
This metric clones the model internally for randomization. The original model weights are preserved.
ModelRandomizationMetric¶
Model Randomization metric.
__init__(self,
model: Callable,
inputs: Union[tf.Dataset, tensorflow.python.framework.tensor.Tensor, numpy.ndarray],
targets: Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray, None],
randomization_strategy: xplique.metrics.randomization.ModelRandomizationStrategy = None,
batch_size: Optional[int] = 64,
activation: Optional[str] = None,
seed: int = 42)¶
model: Callable,
inputs: Union[tf.Dataset, tensorflow.python.framework.tensor.Tensor, numpy.ndarray],
targets: Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray, None],
randomization_strategy: xplique.metrics.randomization.ModelRandomizationStrategy = None,
batch_size: Optional[int] = 64,
activation: Optional[str] = None,
seed: int = 42)
Parameters
-
model : Callable
Model to be randomized.
-
inputs : Union[tf.Dataset, tensorflow.python.framework.tensor.Tensor, numpy.ndarray]
Input samples.
-
targets : Union[tensorflow.python.framework.tensor.Tensor, numpy.ndarray, None]
One-hot encoded labels, shape (N, C), or integer labels which will be one-hot encoded.
-
randomization_strategy : xplique.metrics.randomization.ModelRandomizationStrategy = None
Strategy to randomize the model parameters. If None, a progressive randomization of top 25% layers is used.
-
batch_size : Optional[int] = 64
Number of samples to evaluate at once.
-
activation : Optional[str] = None
Optional activation, not used directly here.
-
seed : int = 42
Random seed for reproducibility.
evaluate(self,
explainer: Union[xplique.attributions.base.WhiteBoxExplainer, xplique.attributions.base.BlackBoxExplainer]) -> float¶
explainer: Union[xplique.attributions.base.WhiteBoxExplainer, xplique.attributions.base.BlackBoxExplainer]) -> float
Compute mean similarity score over the dataset.