Conformalization
This module provides the canvas for conformal prediction.
- class conformalization.ConformalPredictor(calibrator, predictor, splitter, method='cv+', train=True)
Bases:
object
Conformal predictor class.
- Parameters:
predictor (deel.puncc.api.prediction.BasePredictor | object) – underlying model to be conformalized. The model can directly be passed as argument if it already has fit and predict methods.
calibrator (deel.puncc.api.prediction.BaseCalibrator) – nonconformity computation strategy and set predictor.
splitter (deel.puncc.api.prediction.BaseSplitter) – fit/calibration split strategy. The splitter can be set to None if the underlying model is pretrained.
method (str) – method to handle the ensemble prediction and calibration in case the splitter is a K-fold-like strategy. Defaults to ‘cv+’ to follow cv+ procedure.
train (bool) – if False, prediction model(s) will not be (re)trained. Defaults to True.
Warning
if a K-Fold-like splitter is provided with the
train
attribute set to True, an exception is raised. The models have to be trained during the callfit()
.Conformal Regression example:
from sklearn import linear_model from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from deel.puncc.api.conformalization import ConformalPredictor from deel.puncc.api.prediction import BasePredictor from deel.puncc.api.calibration import BaseCalibrator from deel.puncc.api.splitting import KFoldSplitter from deel.puncc.api import nonconformity_scores from deel.puncc.api import prediction_sets # Generate a random regression problem X, y = make_regression(n_samples=1000, n_features=4, n_informative=2, random_state=0, shuffle=False) # Split data into train and test X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=.2, random_state=0 ) # Regression linear model model = linear_model.LinearRegression() # Definition of a predictor. Note that it is not required to wrap # the model here because it already implements fit and predict methods predictor = BasePredictor(model) # Definition of a calibrator, built for a given nonconformity scores # and a procedure to build the prediction sets calibrator = BaseCalibrator(nonconf_score_func=nonconformity_scores.mad, pred_set_func=prediction_sets.constant_interval) # Definition of a K-fold splitter that produces # 20 folds of fit/calibration kfold_splitter = KFoldSplitter(K=20, random_state=42) # Conformal predictor requires the three components instantiated # previously. Our choice of calibrator and splitter yields a cv+ procedure conformal_predictor = ConformalPredictor(predictor=predictor, calibrator=calibrator, splitter=kfold_splitter, train=True) # Fit model and compute nonconformity scores conformal_predictor.fit(X_train, y_train) # The lower and upper bounds of the prediction interval are predicted # by the call to predict on the new data w.r.t a risk level of 10%. # Besides, there is no aggregate point prediction in cv+ so y_pred is None. y_pred , y_lower, y_upper = conformal_predictor.predict(X_test, alpha=.1)
- fit(X, y, use_cached=False, **kwargs)
Fit the model(s) and estimate the nonconformity scores.
If the splitter is an instance of
deel.puncc.splitting.KFoldSplitter
, the fit operates on each fold separately. Thereafter, the predictions and nonconformity scores are combined accordingly to an aggregation method (cv+ by default).- Parameters:
X (Iterable) – features.
y (Iterable) – labels.
use_cached (bool) – if set, enables to add the previously computed nonconformity scores (if any) to the pool estimated in the current call to fit. The aggregation follows the CV+ procedure.
kwargs (dict) – options configuration for the training.
- Raises:
RuntimeError – inconsistencies between the train status of the model(s) and the
train
class attribute.- Return type:
None
- get_nonconformity_scores()
Getter for computed nonconformity scores on the calibration(s) set(s).
- Returns:
dictionary of nonconformity scores indexed by the fold index.
- Return type:
dict
- Raises:
RuntimeError –
fit()
needs to be called beforeget_nonconformity_scores()
.
- get_weights()
Getter for weights associated to calibration samples.
- Returns:
dictionary of weights indexed by the fold index.
- Return type:
dict
- Raises:
RuntimeError –
fit()
needs to be called beforeget_weights()
.
- static load(path)
Load conformal predictor from a file.
- Parameters:
path (str) – file path.
- Returns:
loaded conformal predictor instance.
- Return type:
- predict(X, alpha, correction_func=<function bonferroni>)
Predict point, and interval estimates for X data.
- Parameters:
X (Iterable) – features.
alpha (float) – significance level (max miscoverage target).
correction_func (Callable) – correction for multiple hypothesis testing in the case of multivariate regression. Defaults to Bonferroni correction.
- Returns:
(y_pred, y_lower, y_higher) or (y_pred, pred_set).
- Return type:
Union[Tuple[np.ndarray, np.ndarray], Tuple[np.ndarray, np.ndarray, np.ndarray]]
- save(path, save_data=True)
Serialize current conformal predictor and write it to a file.
- Parameters:
path (str) – File path.
save_data (bool) – If True, save the custom data used to fit/calibrate the model.
- class conformalization.CrossValCpAggregator(K, method='cv+')
Bases:
object
This class enables to aggregate predictions and calibrations from different K-folds.
- Parameters:
K (int) – number of folds
_predictors (dict) – collection of predictors fitted on the K-folds
_calibrators (dict) – collection of calibrators fitted on the K-folds
method (str) – method to handle the ensemble prediction and calibration, defaults to ‘cv+’.
- append_calibrator(key, calibrator)
Add calibrator in kfold calibrators dictionnary.
- Parameters:
key (int) – key of the calibrator.
predictor (BaseCalibrator) – calibrator to be appended.
- append_predictor(key, predictor)
Add predictor in kfold predictors dictionnary.
- Parameters:
key (int) – key of the predictor.
predictor (BasePredictor|DualPredictor) – predictor to be appended.
- get_nonconformity_scores()
Get a dictionnary of residuals computed on the K-folds.
- Returns:
dictionary of residual indexed by the K-fold number.
- Return type:
dict
- get_weights()
Get a dictionnary of normalized weights computed on the K-folds.
- Returns:
dictionary of normalized weights indexed by the K-fold number.
- Return type:
dict
- predict(X, alpha, correction_func=<function bonferroni>)
Predict point, interval and variability estimates for X data.
- Parameters:
X (Iterable) – features.
alpha (float) – significance level (max miscoverage target).
correction_func (Callable) – correction for multiple hypothesis testing in the case of multivariate regression. Defaults to Bonferroni correction.
- Returns:
y_pred, y_lower, y_higher.
- Return type:
Union[Tuple[np.ndarray, np.ndarray], Tuple[np.ndarray, np.ndarray, np.ndarray]]