
This module implements the core calibrator, providing a structure to estimate the nonconformity scores on the calibration set and to compute the prediction sets.

class calibration.BaseCalibrator(*, nonconf_score_func, pred_set_func, weight_func=None)๏ƒ

Bases: object

BaseCalibrator offers a framework to compute user-defined nonconformity scores on calibration dataset(s) (fit()) and to use for constructing and/or calibrating prediction sets (calibrate()).

  • nonconf_score_func (callable) โ€“ nonconformity score function.

  • pred_set_func (callable) โ€“ prediction set construction function.

  • weight_func (callable) โ€“ function that takes as argument an array of features X and returns associated โ€œconformalityโ€ weights, defaults to None.


NotImplementedError โ€“ provided weight_method is not suitable.

Regression calibrator example:

Consider a pretrained model \(\hat{f}\), a calibration dataset \((X_{calib}, y_{calib})\) and a test dataset \((X_{test}, y_{test})\). The model \(\hat{f}\) generates predictions on the calibration and test sets:

\[y_{pred, calib}=\hat{f}(X_{calib})\]
\[y_{pred, test}=\hat{f}(X_{test})\]

Two function need to be defined before instantiating the BaseCalibrator: a nonconformity score function and a definition of how the prediction sets are computed. In the example below, these are implemented from scratch but a collection of ready-to-use nonconformity scores and prediction sets are provided in the modules nonconformity_scores and prediction_sets, respectively.

from deel.puncc.api.calibration import BaseCalibrator
import numpy as np

# First, we define a nonconformity score function that takes as argument
# the predicted values y_pred = model(X) and the true labels y_true. In
# this example, we reimplement the mean absolute deviation that is
# already defined in `deel.puncc.api.nonconformity_scores.mad`
def nonconformity_function(y_pred, y_true):
    return np.abs(y_pred - y_true)

# Prediction sets are computed based on points predictions and
# the quantiles of nonconformity scores. The function below returns a
# fixed size interval around the point predictions.
def prediction_set_function(y_pred, scores_quantile):
    y_lo = y_pred - scores_quantile
    y_hi = y_pred + scores_quantile
    return y_lo, y_hi

# The calibrator is instantiated by passing the two functions defined
# above to the constructor.
calibrator = BaseCalibrator(

# Generate dummy data and predictions
y_pred_calib = np.random.rand(1000)
y_true_calib = np.random.rand(1000)
y_pred_test = np.random.rand(1000)

# The nonconformity scores are computed by calling the `fit` method
# on the calibration dataset., y_true=y_true_calib)

# The lower and upper bounds of the prediction interval are then returned
# by the call to calibrate on the new data w.r.t a risk level of 10%.
y_pred_low, y_pred_high = calibrator.calibrate(y_pred=y_pred_test, alpha=.1)
static barber_weights(weights)๏ƒ

Compute and normalize inference weights of the nonconformity distribution based on Barber et al..


weights (ndarray) โ€“ weights assigned to the samples.


normalized weights.

Return type:


calibrate(*, alpha, y_pred, weights=None, correction=<function bonferroni>)๏ƒ

Compute calibrated prediction sets for new examples w.r.t a significance level \(\alpha\).

  • alpha (float) โ€“ significance level (max miscoverage target).

  • y_pred (Iterable) โ€“ predicted values.

  • weights (Iterable) โ€“ weights to be associated to the nonconformity scores. Defaults to None when all the scores are equiprobable.

  • correction (Callable) โ€“ correction for multiple hypothesis testing in the case of multivariate regression. Defaults to Bonferroni correction.


prediction set. In case of regression, returns (y_lower, y_upper). In case of classification, returns (classes,).

Return type:


  • RuntimeError โ€“ calibrate() called before fit().

  • ValueError โ€“ failed check on alpha w.r.t size of the calibration set.

compute_quantile(*, alpha, weights=None, correction=<function bonferroni>)๏ƒ

Compute quantile of scores w.r.t a significance level \(\alpha\).

  • alpha (float) โ€“ significance level (max miscoverage target).

  • weights (Iterable) โ€“ weights to be associated to the nonconformity scores. Defaults to None when all the scores are equiprobable.

  • correction (Callable) โ€“ correction for multiple hypothesis testing in the case of multivariate regression. Defaults to Bonferroni correction.



Return type:


  • RuntimeError โ€“ compute_quantile() called before fit().

  • ValueError โ€“ failed check on alpha w.r.t size of the calibration set.

fit(*, y_true, y_pred)๏ƒ

Compute and store nonconformity scores on the calibration set.

  • y_true (Iterable) โ€“ true labels.

  • y_pred (Iterable) โ€“ predicted values.

Return type:



Getter of computed nonconformity scores on the calibration set.


nonconformity scores.

Return type:



Getter of normalized weights associated to the nonconformity scores on the calibration set.


normalized weights of nonconformity scores.

Return type:



Setter of normalized weights associated to the nonconformity scores on the calibration set.


norm_weights (ndarray) โ€“ normalized weights array

Return type:


class calibration.CvPlusCalibrator(kfold_calibrators)๏ƒ

Bases: object

Meta calibrator that combines the estimations of nonconformity scores by each K-Fold calibrator and produces associated prediction intervals based on CV+.


kfold_calibrators_dict (dict) โ€“ collection of calibrators for each K-fold (disjoint calibration subsets). Each calibrator needs to priorly estimate the nonconformity scores w.r.t the associated calibration fold.

calibrate(*, X, kfold_predictors_dict, alpha)๏ƒ

Compute calibrated prediction intervals for new examples X.

  • X (Iterable) โ€“ test features.

  • kfold_predictors_dict (dict) โ€“ dictionnary of predictors trained on each fold.

  • alpha (float) โ€“ significance level (maximum miscoverage target).


y_lower, y_upper.

Return type:



Check if all calibrators have already been fitted.


RuntimeError โ€“ one or more of the calibrators did not estimate the nonconformity scores.

Return type:


class calibration.ScoreCalibrator(nonconf_score_func, weight_func=None)๏ƒ

Bases: object

ScoreCalibrator offers a framework to compute user-defined scores on a calibration dataset (fit()) and to test the conformity of new data points (is_conformal()) with respect to a significance (error) level \(\alpha\). Such calibrator can be used for example to calibrate the decision threshold of anomaly detection scores.

  • nonconf_score_func (callable) โ€“ nonconformity score function.

  • weight_func (callable) โ€“ function that takes as argument an array of data points and returns associated โ€œconformalityโ€ weights, defaults to None.

Anomaly detection example:

Consider the two moons dataset. We want to detect anomalous points in a new sample generated following a uniform distribution. The LOF algorithm is used to obtain anomaly scores; then a ScoreCalibrator is instantiated to decide which scores are conformal (not anomalies) with respect to a significance level \(\alpha\).

import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.neighbors import LocalOutlierFactor
import matplotlib.pyplot as plt

from deel.puncc.api.calibration import ScoreCalibrator

# First, we generate the two moons dataset
dataset = 4*make_moons(n_samples=1000, noise=0.05,
    random_state=0)[0] - np.array([0.5, 0.25])

# Split data into proper fitting and calibration sets
fit_set, calib_set = train_test_split(dataset, train_size=.8)

# Generate new data points
rng = np.random.RandomState(42)
new_samples = rng.uniform(low=-6, high=8, size=(200, 2))

# Instantiate the LOF anomaly detection algorithm
algorithm = LocalOutlierFactor(n_neighbors=35, novelty=True)

# Fit the LOF on the proper fitting dataset

# The nonconformity scores are defined as the LOF (anomaly) scores.
# By default, score_samples return the opposite of LOF scores.
ncf = lambda X: -algorithm.score_samples(X)

# The ScoreCalibrator is instantiated by passing the LOF score function
# to the constructor
cad = ScoreCalibrator(nonconf_score_func=ncf)

# The LOF scores are computed by calling the `fit` method
# on the calibration dataset

# We set the target false detection rate to 1%
alpha = .01

# The method `is_conformal` is called on the new data points
# to test which are conformal (not anomalous) and which are not
results = cad.is_conformal(z=new_samples, alpha=alpha)
not_anomalies = new_samples[results]
anomalies = new_samples[np.invert(results)]

# Plot the results
plt.scatter(calib_set[:,0], calib_set[:,1],
            s=10, label="Inliers")
plt.scatter(not_anomalies[:, 0], not_anomalies[:, 1], s=40, marker="x",
            color="blue", label="Normal")
plt.scatter(anomalies[:, 0], anomalies[:, 1], s=40, marker="x",
            color="red", label="Anomaly")
plt.legend(loc="lower left")

Compute and store nonconformity scores on the calibration set.


z (Iterable) โ€“ calibration dataset.


Getter of computed nonconformity scores on the calibration set.


nonconformity scores.

Return type:


is_conformal(z, alpha)๏ƒ

Test if new data points z are conformal. The test result is True if the new sample is conformal w.r.t a significance level \(\alpha\) and False otherwise.

  • z (Iterable) โ€“ new samples.

  • alpha (float) โ€“ significance level.


conformity test results.

Return type:



Setter of nonconformity scores. Can be used instead of calling fit() if the nonconformity scores are already computed.


scores (ndarray) โ€“ nonconformity scores.