Calibration

This module implements the core calibrator, providing a structure to estimate the nonconformity scores on the calibration set and to compute the prediction sets.

class calibration.BaseCalibrator(*, nonconf_score_func, pred_set_func, weight_func=None)

Bases: object

BaseCalibrator offers a framework to compute user-defined nonconformity scores on calibration dataset(s) (fit()) and to use for constructing and/or calibrating prediction sets (calibrate()).

Parameters:

nonconf_score_func (callable) – nonconformity score function.
pred_set_func (callable) – prediction set construction function.
weight_func (callable) – function that takes as argument an array of features X and returns associated “conformality” weights, defaults to None.

Raises:

NotImplementedError – provided weight_method is not suitable.

Regression calibrator example:

Consider a pretrained model \(\hat{f}\), a calibration dataset \((X_{calib}, y_{calib})\) and a test dataset \((X_{test}, y_{test})\). The model \(\hat{f}\) generates predictions on the calibration and test sets:

\[y_{pred, calib}=\hat{f}(X_{calib})\]

\[y_{pred, test}=\hat{f}(X_{test})\]

Two function need to be defined before instantiating the BaseCalibrator: a nonconformity score function and a definition of how the prediction sets are computed. In the example below, these are implemented from scratch but a collection of ready-to-use nonconformity scores and prediction sets are provided in the modules nonconformity_scores and prediction_sets, respectively.

from deel.puncc.api.calibration import BaseCalibrator
import numpy as np

# First, we define a nonconformity score function that takes as argument
# the predicted values y_pred = model(X) and the true labels y_true. In
# this example, we reimplement the mean absolute deviation that is
# already defined in `deel.puncc.api.nonconformity_scores.mad`
def nonconformity_function(y_pred, y_true):
    return np.abs(y_pred - y_true)

# Prediction sets are computed based on points predictions and
# the quantiles of nonconformity scores. The function below returns a
# fixed size interval around the point predictions.
def prediction_set_function(y_pred, scores_quantile):
    y_lo = y_pred - scores_quantile
    y_hi = y_pred + scores_quantile
    return y_lo, y_hi

# The calibrator is instantiated by passing the two functions defined
# above to the constructor.
calibrator = BaseCalibrator(
    nonconf_score_func=nonconformity_function,
    pred_set_func=prediction_set_function
)

# Generate dummy data and predictions
y_pred_calib = np.random.rand(1000)
y_true_calib = np.random.rand(1000)
y_pred_test = np.random.rand(1000)

# The nonconformity scores are computed by calling the `fit` method
# on the calibration dataset.
calibrator.fit(y_pred=y_pred_calib, y_true=y_true_calib)

# The lower and upper bounds of the prediction interval are then returned
# by the call to calibrate on the new data w.r.t a risk level of 10%.
y_pred_low, y_pred_high = calibrator.calibrate(y_pred=y_pred_test, alpha=.1)

static barber_weights(weights)

Compute and normalize inference weights of the nonconformity distribution based on Barber et al..

Parameters:: weights (ndarray) – weights assigned to the samples.
Returns:: normalized weights.
Return type:: ndarray

calibrate(*, alpha, y_pred, weights=None, correction=<function bonferroni>)

Compute calibrated prediction sets for new examples w.r.t a significance level \(\alpha\).

Parameters:

alpha (float) – significance level (max miscoverage target).
y_pred (Iterable) – predicted values.
weights (Iterable) – weights to be associated to the nonconformity scores. Defaults to None when all the scores are equiprobable.
correction (Callable) – correction for multiple hypothesis testing in the case of multivariate regression. Defaults to Bonferroni correction.

Returns:

prediction set. In case of regression, returns (y_lower, y_upper). In case of classification, returns (classes,).

Return type:

Tuple[ndarray]

Raises:

RuntimeError – calibrate() called before fit().
ValueError – failed check on alpha w.r.t size of the calibration set.

compute_quantile(*, alpha, weights=None, correction=<function bonferroni>)

Compute quantile of scores w.r.t a significance level \(\alpha\).

Parameters:

alpha (float) – significance level (max miscoverage target).
weights (Iterable) – weights to be associated to the nonconformity scores. Defaults to None when all the scores are equiprobable.
correction (Callable) – correction for multiple hypothesis testing in the case of multivariate regression. Defaults to Bonferroni correction.

Returns:

quantile

Return type:

ndarray

Raises:

RuntimeError – compute_quantile() called before fit().
ValueError – failed check on alpha w.r.t size of the calibration set.

fit(*, y_true, y_pred)

Compute and store nonconformity scores on the calibration set.

Parameters:

y_true (Iterable) – true labels.
y_pred (Iterable) – predicted values.

Return type:

None

get_nonconformity_scores()

Getter of computed nonconformity scores on the calibration set.

Returns:: nonconformity scores.
Return type:: np.ndarray

get_norm_weights()

Getter of normalized weights associated to the nonconformity scores on the calibration set.

Returns:: normalized weights of nonconformity scores.
Return type:: np.ndarray

set_norm_weights(norm_weights)

Setter of normalized weights associated to the nonconformity scores on the calibration set.

Parameters:: norm_weights (ndarray) – normalized weights array
Return type:: None

class calibration.CvPlusCalibrator(kfold_calibrators)

Bases: object

Meta calibrator that combines the estimations of nonconformity scores by each K-Fold calibrator and produces associated prediction intervals based on CV+.

Parameters:: kfold_calibrators_dict (dict) – collection of calibrators for each K-fold (disjoint calibration subsets). Each calibrator needs to priorly estimate the nonconformity scores w.r.t the associated calibration fold.

calibrate(*, X, kfold_predictors_dict, alpha)

Compute calibrated prediction intervals for new examples X.

Parameters:

X (Iterable) – test features.
kfold_predictors_dict (dict) – dictionnary of predictors trained on each fold.
alpha (float) – significance level (maximum miscoverage target).

Returns:

y_lower, y_upper.

Return type:

Tuple[ndarray]

fit()

Check if all calibrators have already been fitted.

Raises:: RuntimeError – one or more of the calibrators did not estimate the nonconformity scores.
Return type:: None

class calibration.ScoreCalibrator(nonconf_score_func, weight_func=None)

Bases: object

ScoreCalibrator offers a framework to compute user-defined scores on a calibration dataset (fit()) and to test the conformity of new data points (is_conformal()) with respect to a significance (error) level \(\alpha\). Such calibrator can be used for example to calibrate the decision threshold of anomaly detection scores.

Parameters:

nonconf_score_func (callable) – nonconformity score function.
weight_func (callable) – function that takes as argument an array of data points and returns associated “conformality” weights, defaults to None.

Anomaly detection example:

Consider the two moons dataset. We want to detect anomalous points in a new sample generated following a uniform distribution. The LOF algorithm is used to obtain anomaly scores; then a ScoreCalibrator is instantiated to decide which scores are conformal (not anomalies) with respect to a significance level \(\alpha\).

import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.neighbors import LocalOutlierFactor
import matplotlib.pyplot as plt

from deel.puncc.api.calibration import ScoreCalibrator

# First, we generate the two moons dataset
dataset = 4*make_moons(n_samples=1000, noise=0.05,
    random_state=0)[0] - np.array([0.5, 0.25])

# Split data into proper fitting and calibration sets
fit_set, calib_set = train_test_split(dataset, train_size=.8)

# Generate new data points
rng = np.random.RandomState(42)
new_samples = rng.uniform(low=-6, high=8, size=(200, 2))

# Instantiate the LOF anomaly detection algorithm
algorithm = LocalOutlierFactor(n_neighbors=35, novelty=True)

# Fit the LOF on the proper fitting dataset
algorithm.fit(X=fit_set)

# The nonconformity scores are defined as the LOF (anomaly) scores.
# By default, score_samples return the opposite of LOF scores.
ncf = lambda X: -algorithm.score_samples(X)

# The ScoreCalibrator is instantiated by passing the LOF score function
# to the constructor
cad = ScoreCalibrator(nonconf_score_func=ncf)

# The LOF scores are computed by calling the `fit` method
# on the calibration dataset
cad.fit(calib_set)

# We set the target false detection rate to 1%
alpha = .01

# The method `is_conformal` is called on the new data points
# to test which are conformal (not anomalous) and which are not
results = cad.is_conformal(z=new_samples, alpha=alpha)
not_anomalies = new_samples[results]
anomalies = new_samples[np.invert(results)]

# Plot the results
plt.scatter(calib_set[:,0], calib_set[:,1],
            s=10, label="Inliers")
plt.scatter(not_anomalies[:, 0], not_anomalies[:, 1], s=40, marker="x",
            color="blue", label="Normal")
plt.scatter(anomalies[:, 0], anomalies[:, 1], s=40, marker="x",
            color="red", label="Anomaly")
plt.xticks(())
plt.yticks(())
plt.legend(loc="lower left")

fit(z)

Compute and store nonconformity scores on the calibration set.

Parameters:: z (Iterable) – calibration dataset.

get_nonconformity_scores()

Getter of computed nonconformity scores on the calibration set.

Returns:: nonconformity scores.
Return type:: np.ndarray

is_conformal(z, alpha)

Test if new data points z are conformal. The test result is True if the new sample is conformal w.r.t a significance level \(\alpha\) and False otherwise.

Parameters:

z (Iterable) – new samples.
alpha (float) – significance level.

Returns:

conformity test results.

Return type:

np.ndarray[bool]

set_nonconformity_scores(scores)

Setter of nonconformity scores. Can be used instead of calling fit() if the nonconformity scores are already computed.

Parameters:: scores (ndarray) – nonconformity scores.