Conformalization
This module provides the canvas for conformal prediction.
- class conformalization.ConformalPredictor(calibrator, predictor, splitter, method='cv+', train=True)
Bases:
objectConformal predictor class.
- Parameters:
predictor (deel.puncc.api.prediction.BasePredictor | object) – underlying model to be conformalized. The model can directly be passed as argument if it already has fit and predict methods.
calibrator (deel.puncc.api.prediction.BaseCalibrator) – nonconformity computation strategy and set predictor.
splitter (deel.puncc.api.prediction.BaseSplitter) – fit/calibration split strategy. The splitter can be set to None if the underlying model is pretrained.
method (str) – method to handle the ensemble prediction and calibration in case the splitter is a K-fold-like strategy. Defaults to ‘cv+’ to follow cv+ procedure.
train (bool) – if False, prediction model(s) will not be (re)trained. Defaults to True.
Warning
if a K-Fold-like splitter is provided with the
trainattribute set to True, an exception is raised. The models have to be trained during the callfit().Conformal Regression example:
from sklearn import linear_model from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from deel.puncc.api.conformalization import ConformalPredictor from deel.puncc.api.prediction import BasePredictor from deel.puncc.api.calibration import BaseCalibrator from deel.puncc.api.splitting import KFoldSplitter from deel.puncc.api import nonconformity_scores from deel.puncc.api import prediction_sets # Generate a random regression problem X, y = make_regression(n_samples=1000, n_features=4, n_informative=2, random_state=0, shuffle=False) # Split data into train and test X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=.2, random_state=0 ) # Regression linear model model = linear_model.LinearRegression() # Definition of a predictor. Note that it is not required to wrap # the model here because it already implements fit and predict methods predictor = BasePredictor(model) # Definition of a calibrator, built for a given nonconformity scores # and a procedure to build the prediction sets calibrator = BaseCalibrator(nonconf_score_func=nonconformity_scores.mad, pred_set_func=prediction_sets.constant_interval) # Definition of a K-fold splitter that produces # 20 folds of fit/calibration kfold_splitter = KFoldSplitter(K=20, random_state=42) # Conformal predictor requires the three components instantiated # previously. Our choice of calibrator and splitter yields a cv+ procedure conformal_predictor = ConformalPredictor(predictor=predictor, calibrator=calibrator, splitter=kfold_splitter, train=True) # Fit model and compute nonconformity scores conformal_predictor.fit(X_train, y_train) # The lower and upper bounds of the prediction interval are predicted # by the call to predict on the new data w.r.t a risk level of 10%. # Besides, there is no aggregate point prediction in cv+ so y_pred is None. y_pred , y_lower, y_upper = conformal_predictor.predict(X_test, alpha=.1)
- fit(X, y, use_cached=False, **kwargs)
Fit the model(s) and estimate the nonconformity scores.
If the splitter is an instance of
deel.puncc.splitting.KFoldSplitter, the fit operates on each fold separately. Thereafter, the predictions and nonconformity scores are combined accordingly to an aggregation method (cv+ by default).- Parameters:
X (Iterable) – features.
y (Iterable) – labels.
use_cached (bool) – if set, enables to add the previously computed nonconformity scores (if any) to the pool estimated in the current call to fit. The aggregation follows the CV+ procedure.
kwargs (dict) – options configuration for the training.
- Raises:
RuntimeError – inconsistencies between the train status of the model(s) and the
trainclass attribute.- Return type:
None
- get_nonconformity_scores()
Getter for computed nonconformity scores on the calibration(s) set(s).
- Returns:
dictionary of nonconformity scores indexed by the fold index.
- Return type:
dict
- Raises:
RuntimeError –
fit()needs to be called beforeget_nonconformity_scores().
- get_weights()
Getter for weights associated to calibration samples.
- Returns:
dictionary of weights indexed by the fold index.
- Return type:
dict
- Raises:
RuntimeError –
fit()needs to be called beforeget_weights().
- static load(path)
Load conformal predictor from a file.
- Parameters:
path (str) – file path.
- Returns:
loaded conformal predictor instance.
- Return type:
- predict(X, alpha, correction_func=<function bonferroni>)
Predict point, and interval estimates for X data.
- Parameters:
X (Iterable) – features.
alpha (float) – significance level (max miscoverage target).
correction_func (Callable) – correction for multiple hypothesis testing in the case of multivariate regression. Defaults to Bonferroni correction.
- Returns:
(y_pred, y_lower, y_higher) or (y_pred, pred_set).
- Return type:
Union[Tuple[np.ndarray, np.ndarray], Tuple[np.ndarray, np.ndarray, np.ndarray]]
- save(path, save_data=True)
Serialize current conformal predictor and write it to a file.
- Parameters:
path (str) – File path.
save_data (bool) – If True, save the custom data used to fit/calibrate the model.
- class conformalization.CrossValCpAggregator(K, method='cv+')
Bases:
objectThis class enables to aggregate predictions and calibrations from different K-folds.
- Parameters:
K (int) – number of folds
_predictors (dict) – collection of predictors fitted on the K-folds
_calibrators (dict) – collection of calibrators fitted on the K-folds
method (str) – method to handle the ensemble prediction and calibration, defaults to ‘cv+’.
- append_calibrator(key, calibrator)
Add calibrator in kfold calibrators dictionnary.
- Parameters:
key (int) – key of the calibrator.
predictor (BaseCalibrator) – calibrator to be appended.
- append_predictor(key, predictor)
Add predictor in kfold predictors dictionnary.
- Parameters:
key (int) – key of the predictor.
predictor (BasePredictor|DualPredictor) – predictor to be appended.
- get_nonconformity_scores()
Get a dictionnary of residuals computed on the K-folds.
- Returns:
dictionary of residual indexed by the K-fold number.
- Return type:
dict
- get_weights()
Get a dictionnary of normalized weights computed on the K-folds.
- Returns:
dictionary of normalized weights indexed by the K-fold number.
- Return type:
dict
- predict(X, alpha, correction_func=<function bonferroni>)
Predict point, interval and variability estimates for X data.
- Parameters:
X (Iterable) – features.
alpha (float) – significance level (max miscoverage target).
correction_func (Callable) – correction for multiple hypothesis testing in the case of multivariate regression. Defaults to Bonferroni correction.
- Returns:
y_pred, y_lower, y_higher.
- Return type:
Union[Tuple[np.ndarray, np.ndarray], Tuple[np.ndarray, np.ndarray, np.ndarray]]