Conformalization๏
This module provides the canvas for conformal prediction.
- class conformalization.ConformalPredictor(calibrator, predictor, splitter, method='cv+', train=True)๏
Bases:
objectConformal predictor class.
- Parameters:
predictor (deel.puncc.api.prediction.BasePredictor | object) โ underlying model to be conformalized. The model can directly be passed as argument if it already has fit and predict methods.
calibrator (deel.puncc.api.prediction.BaseCalibrator) โ nonconformity computation strategy and set predictor.
splitter (deel.puncc.api.prediction.BaseSplitter) โ fit/calibration split strategy. The splitter can be set to None if the underlying model is pretrained.
method (str) โ method to handle the ensemble prediction and calibration in case the splitter is a K-fold-like strategy. Defaults to โcv+โ to follow cv+ procedure.
train (bool) โ if False, prediction model(s) will not be (re)trained. Defaults to True.
Warning
if a K-Fold-like splitter is provided with the
trainattribute set to True, an exception is raised. The models have to be trained during the callfit().Conformal Regression example:
from sklearn import linear_model from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from deel.puncc.api.conformalization import ConformalPredictor from deel.puncc.api.prediction import BasePredictor from deel.puncc.api.calibration import BaseCalibrator from deel.puncc.api.splitting import KFoldSplitter from deel.puncc.api import nonconformity_scores from deel.puncc.api import prediction_sets # Generate a random regression problem X, y = make_regression(n_samples=1000, n_features=4, n_informative=2, random_state=0, shuffle=False) # Split data into train and test X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=.2, random_state=0 ) # Regression linear model model = linear_model.LinearRegression() # Definition of a predictor. Note that it is not required to wrap # the model here because it already implements fit and predict methods predictor = BasePredictor(model) # Definition of a calibrator, built for a given nonconformity scores # and a procedure to build the prediction sets calibrator = BaseCalibrator(nonconf_score_func=nonconformity_scores.mad, pred_set_func=prediction_sets.constant_interval) # Definition of a K-fold splitter that produces # 20 folds of fit/calibration kfold_splitter = KFoldSplitter(K=20, random_state=42) # Conformal predictor requires the three components instantiated # previously. Our choice of calibrator and splitter yields a cv+ procedure conformal_predictor = ConformalPredictor(predictor=predictor, calibrator=calibrator, splitter=kfold_splitter, train=True) # Fit model and compute nonconformity scores conformal_predictor.fit(X_train, y_train) # The lower and upper bounds of the prediction interval are predicted # by the call to predict on the new data w.r.t a risk level of 10%. # Besides, there is no aggregate point prediction in cv+ so y_pred is None. y_pred , y_lower, y_upper = conformal_predictor.predict(X_test, alpha=.1)
- fit(X, y, use_cached=False, **kwargs)๏
Fit the model(s) and estimate the nonconformity scores.
If the splitter is an instance of
deel.puncc.splitting.KFoldSplitter, the fit operates on each fold separately. Thereafter, the predictions and nonconformity scores are combined accordingly to an aggregation method (cv+ by default).- Parameters:
X (Iterable) โ features.
y (Iterable) โ labels.
use_cached (bool) โ if set, enables to add the previously computed nonconformity scores (if any) to the pool estimated in the current call to fit. The aggregation follows the CV+ procedure.
kwargs (dict) โ options configuration for the training.
- Raises:
RuntimeError โ inconsistencies between the train status of the model(s) and the
trainclass attribute.- Return type:
None
- get_nonconformity_scores()๏
Getter for computed nonconformity scores on the calibration(s) set(s).
- Returns:
dictionary of nonconformity scores indexed by the fold index.
- Return type:
dict
- Raises:
RuntimeError โ
fit()needs to be called beforeget_nonconformity_scores().
- get_weights()๏
Getter for weights associated to calibration samples.
- Returns:
dictionary of weights indexed by the fold index.
- Return type:
dict
- Raises:
RuntimeError โ
fit()needs to be called beforeget_weights().
- static load(path)๏
Load conformal predictor from a file.
- Parameters:
path (str) โ file path.
- Returns:
loaded conformal predictor instance.
- Return type:
- predict(X, alpha, correction_func=<function bonferroni>)๏
Predict point, and interval estimates for X data.
- Parameters:
X (Iterable) โ features.
alpha (float) โ significance level (max miscoverage target).
correction_func (Callable) โ correction for multiple hypothesis testing in the case of multivariate regression. Defaults to Bonferroni correction.
- Returns:
(y_pred, y_lower, y_higher) or (y_pred, pred_set).
- Return type:
Union[Tuple[np.ndarray, np.ndarray], Tuple[np.ndarray, np.ndarray, np.ndarray]]
- save(path, save_data=True)๏
Serialize current conformal predictor and write it to a file.
- Parameters:
path (str) โ File path.
save_data (bool) โ If True, save the custom data used to fit/calibrate the model.
- class conformalization.CrossValCpAggregator(K, method='cv+')๏
Bases:
objectThis class enables to aggregate predictions and calibrations from different K-folds.
- Parameters:
K (int) โ number of folds
_predictors (dict) โ collection of predictors fitted on the K-folds
_calibrators (dict) โ collection of calibrators fitted on the K-folds
method (str) โ method to handle the ensemble prediction and calibration, defaults to โcv+โ.
- append_calibrator(key, calibrator)๏
Add calibrator in kfold calibrators dictionnary.
- Parameters:
key (int) โ key of the calibrator.
predictor (BaseCalibrator) โ calibrator to be appended.
- append_predictor(key, predictor)๏
Add predictor in kfold predictors dictionnary.
- Parameters:
key (int) โ key of the predictor.
predictor (BasePredictor|DualPredictor) โ predictor to be appended.
- get_nonconformity_scores()๏
Get a dictionnary of residuals computed on the K-folds.
- Returns:
dictionary of residual indexed by the K-fold number.
- Return type:
dict
- get_weights()๏
Get a dictionnary of normalized weights computed on the K-folds.
- Returns:
dictionary of normalized weights indexed by the K-fold number.
- Return type:
dict
- predict(X, alpha, correction_func=<function bonferroni>)๏
Predict point, interval and variability estimates for X data.
- Parameters:
X (Iterable) โ features.
alpha (float) โ significance level (max miscoverage target).
correction_func (Callable) โ correction for multiple hypothesis testing in the case of multivariate regression. Defaults to Bonferroni correction.
- Returns:
y_pred, y_lower, y_higher.
- Return type:
Union[Tuple[np.ndarray, np.ndarray], Tuple[np.ndarray, np.ndarray, np.ndarray]]
- class conformalization.SplitConformalPredictor(predictor, *, nonconf_score_func=None, pred_set_func=None, train=True, random_state=None, weight_func=None, CalibratorClass=<class 'deel.puncc.api.calibration.BaseCalibrator'>)๏
Bases:
objectBase class for split conformal prediction.
This class implements the generic split conformal prediction workflow: a predictive model is optionally trained on a fit subset, nonconformity scores are computed on a calibration subset, and prediction sets are produced for new samples with guaranteed marginal coverage.
For more details on the methodology, see the theory overview page.
- Parameters:
predictor (BasePredictor) โ A predictor implementing fit and predict. The predictor may be already trained or trained during the call to
fit().nonconf_score_func (callable) โ Function used to compute nonconformity scores from predictions and observed targets.
pred_set_func (callable) โ Function used to construct prediction sets from predictions and calibrated quantiles.
train (bool) โ If False, the predictor is assumed to be already trained and will not be retrained during
fit(). Defaults to True.random_state (int) โ Random seed used when automatically splitting the data into fit and calibration subsets.
weight_func (callable) โ Optional function mapping input features X to conformality weights, used for weighted conformal prediction. Defaults to None.
CalibratorClass โ Class of the calibrator to be used. Defaults to
BaseCalibrator.
Note
The data splitting strategy depends on the arguments passed to
fit():If X and y are provided, the data are randomly split into fit and calibration subsets.
If X_fit, y_fit, X_calib and y_calib are provided, these user-defined subsets are used directly.
If the predictor is already trained and train=False, only a calibration set is required.
Example:
from deel.puncc.api.prediction import BasePredictor from deel.puncc.api.conformalization import SplitConformalPredictor from deel.puncc.api import nonconformity_scores from deel.puncc.api import prediction_sets from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestRegressor # Generate a regression problem X, y = make_regression(n_samples=1000, n_features=4, random_state=0) # Split data X, X_test, y, y_test = train_test_split(X, y, test_size=0.2, random_state=0) X_fit, X_calib, y_fit, y_calib = train_test_split( X, y, test_size=0.2, random_state=0 ) # Base predictor model = RandomForestRegressor(random_state=0) predictor = BasePredictor(model, is_trained=False) # Split conformal predictor cp = SplitConformalPredictor(predictor, nonconf_score_func=nonconformity_scores.absolute_difference, pred_set_func=prediction_sets.constant_interval, train=True, random_state=0) cp.fit(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib) # Conformal prediction y_pred, y_lower, y_upper = cp.predict(X_test, alpha=0.2)
- fit(*, X=None, y=None, fit_ratio=0.8, X_fit=None, y_fit=None, X_calib=None, y_calib=None, use_cached=False, **kwargs)๏
This method fits the models on the fit data and computes nonconformity scores on calibration data. If (X, y) are provided, randomly split data into fit and calib subsets w.r.t to the fit_ratio. In case (X_fit, y_fit) and (X_calib, y_calib) are provided, the conformalization is performed on the given user defined fit and calibration sets.
Note
If X and y are provided, fit ignores any user-defined fit/calib split.
- Parameters:
X (Iterable) โ features from the training dataset.
y (Iterable) โ labels from the training dataset.
fit_ratio (float) โ the proportion of samples assigned to the fit subset.
X_fit (Iterable) โ features from the fit dataset.
y_fit (Iterable) โ labels from the fit dataset.
X_calib (Iterable) โ features from the calibration dataset.
y_calib (Iterable) โ labels from the calibration dataset.
use_cached (bool) โ if set, enables to add the previously computed nonconformity scores (if any) to the pool estimated in the current call to fit. The aggregation follows the CV+ procedure.
kwargs (dict) โ predict configuration to be passed to the modelโs fit method.
- Raises:
RuntimeError โ no dataset provided.
- get_nonconformity_scores()๏
Get computed nonconformity scores.
- Returns:
computed nonconfomity scores.
- Return type:
ndarray
- predict(X_test, alpha)๏
Conformal interval predictions (w.r.t target miscoverage alpha) for new samples.
- Parameters:
X_test (Iterable) โ features of new samples.
alpha (float) โ target maximum miscoverage.
- Returns:
Tuple composed of the model estimate y_pred and the prediction set
- Return type:
Tuple