๐ Regression๏
Currently implemented conformal prediction methods for regression are listed in this page.
Each of these wrappers conformalize point-based or interval-based models that
are passed as argument in the object constructor. Such models need to
implement the fit()
and predict()
methods.
Prediction module from the API ensures the
compliance of models from various ML/DL libraries (such as Keras and scikit-learn) to puncc.
- class deel.puncc.regression.SplitCP(predictor, *, train=True, random_state=None, weight_func=None)๏
Split conformal prediction method. For more details, we refer the user to the theory overview page.
- Parameters:
predictor (BasePredictor) โ a predictor implementing fit and predict.
train (bool) โ if False, prediction model(s) will not be (re)trained. Defaults to True.
random_state (int) โ random seed used when the user does not provide a custom fit/calibration split in fit method.
weight_func (callable) โ function that takes as argument an array of features X and returns associated โconformalityโ weights, defaults to None.
Example:
from deel.puncc.regression import SplitCP from deel.puncc.api.prediction import BasePredictor from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestRegressor from deel.puncc.metrics import regression_mean_coverage from deel.puncc.metrics import regression_sharpness # Generate a random regression problem X, y = make_regression(n_samples=1000, n_features=4, n_informative=2, random_state=0, shuffle=False) # Split data into train and test X, X_test, y, y_test = train_test_split( X, y, test_size=.2, random_state=0 ) # Split train data into fit and calibration X_fit, X_calib, y_fit, y_calib = train_test_split( X, y, test_size=.2, random_state=0 ) # Create a random forest model and wrap it in a predictor rf_model = RandomForestRegressor(n_estimators=100, random_state=0) rf_predictor = BasePredictor(rf_model, is_trained=False) # CP method initialization split_cp = SplitCP(rf_predictor) # The call to `fit` trains the model and computes the nonconformity # scores on the calibration set split_cp.fit(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib) # The predict method infers prediction intervals with respect to # the significance level alpha = 20% y_pred, y_pred_lower, y_pred_upper = split_cp.predict(X_test, alpha=.2) # Compute marginal coverage and average width of the prediction intervals coverage = regression_mean_coverage(y_test, y_pred_lower, y_pred_upper) width = regression_sharpness(y_pred_lower=y_pred_lower, y_pred_upper=y_pred_upper) print(f"Marginal coverage: {np.round(coverage, 2)}") print(f"Average width: {np.round(width, 2)}")
- fit(*, X=None, y=None, fit_ratio=0.8, X_fit=None, y_fit=None, X_calib=None, y_calib=None, use_cached=False, **kwargs)๏
This method fits the models on the fit data and computes nonconformity scores on calibration data. If (X, y) are provided, randomly split data into fit and calib subsets w.r.t to the fit_ratio. In case (X_fit, y_fit) and (X_calib, y_calib) are provided, the conformalization is performed on the given user defined fit and calibration sets.
Note
If X and y are provided, fit ignores any user-defined fit/calib split.
- Parameters:
X (Iterable) โ features from the training dataset.
y (Iterable) โ labels from the training dataset.
fit_ratio (float) โ the proportion of samples assigned to the fit subset.
X_fit (Iterable) โ features from the fit dataset.
y_fit (Iterable) โ labels from the fit dataset.
X_calib (Iterable) โ features from the calibration dataset.
y_calib (Iterable) โ labels from the calibration dataset.
use_cached (bool) โ if set, enables to add the previously computed nonconformity scores (if any) to the pool estimated in the current call to fit. The aggregation follows the CV+ procedure.
kwargs (dict) โ predict configuration to be passed to the modelโs fit method.
- Raises:
RuntimeError โ no dataset provided.
- get_nonconformity_scores()๏
Get computed nonconformity scores.
- Returns:
computed nonconfomity scores.
- Return type:
ndarray
- predict(X_test, alpha)๏
Conformal interval predictions (w.r.t target miscoverage alpha) for new samples.
- Parameters:
X_test (Iterable) โ features of new samples.
alpha (float) โ target maximum miscoverage.
- Returns:
y_pred, y_lower, y_higher
- Return type:
Tuple[np.ndarray, np.ndarray, np.ndarray]
- class deel.puncc.regression.LocallyAdaptiveCP(predictor, *, train=True, random_state=None, weight_func=None)๏
Locally adaptive conformal prediction method. For more details, we refer the user to the theory overview page
- Parameters:
predictor (MeanVarPredictor) โ a predictor implementing fit and predict. Must embed two models for point and dispersion estimations respectively.
train (bool) โ if False, prediction model(s) will not be (re)trained. Defaults to True.
random_state (float) โ random seed used when the user does not provide a custom fit/calibration split in fit method.
weight_func (callable) โ function that takes as argument an array of features X and returns associated โconformalityโ weights, defaults to None.
Example:
from deel.puncc.regression import LocallyAdaptiveCP from deel.puncc.api.prediction import MeanVarPredictor from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestRegressor from deel.puncc.metrics import regression_mean_coverage from deel.puncc.metrics import regression_sharpness # Generate a random regression problem X, y = make_regression(n_samples=1000, n_features=4, n_informative=2, random_state=0, shuffle=False) # Split data into train and test X, X_test, y, y_test = train_test_split( X, y, test_size=.2, random_state=0 ) # Split train data into fit and calibration X_fit, X_calib, y_fit, y_calib = train_test_split( X, y, test_size=.2, random_state=0 ) # Create two models mu (mean) and sigma (dispersion) mu_model = RandomForestRegressor(n_estimators=100, random_state=0) sigma_model = RandomForestRegressor(n_estimators=100, random_state=0) # Wrap models in a mean/variance predictor mean_var_predictor = MeanVarPredictor(models=[mu_model, sigma_model]) # CP method initialization lacp = LocallyAdaptiveCP(mean_var_predictor) # The call to `fit` trains the model and computes the nonconformity # scores on the calibration set lacp.fit(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib) # The predict method infers prediction intervals with respect to # the significance level alpha = 20% y_pred, y_pred_lower, y_pred_upper = lacp.predict(X_test, alpha=.2) # Compute marginal coverage and average width of the prediction intervals coverage = regression_mean_coverage(y_test, y_pred_lower, y_pred_upper) width = regression_sharpness(y_pred_lower=y_pred_lower, y_pred_upper=y_pred_upper) print(f"Marginal coverage: {np.round(coverage, 2)}") print(f"Average width: {np.round(width, 2)}")
- fit(*, X=None, y=None, fit_ratio=0.8, X_fit=None, y_fit=None, X_calib=None, y_calib=None, use_cached=False, **kwargs)๏
This method fits the models on the fit data and computes nonconformity scores on calibration data. If (X, y) are provided, randomly split data into fit and calib subsets w.r.t to the fit_ratio. In case (X_fit, y_fit) and (X_calib, y_calib) are provided, the conformalization is performed on the given user defined fit and calibration sets.
Note
If X and y are provided, fit ignores any user-defined fit/calib split.
- Parameters:
X (Iterable) โ features from the training dataset.
y (Iterable) โ labels from the training dataset.
fit_ratio (float) โ the proportion of samples assigned to the fit subset.
X_fit (Iterable) โ features from the fit dataset.
y_fit (Iterable) โ labels from the fit dataset.
X_calib (Iterable) โ features from the calibration dataset.
y_calib (Iterable) โ labels from the calibration dataset.
use_cached (bool) โ if set, enables to add the previously computed nonconformity scores (if any) to the pool estimated in the current call to fit. The aggregation follows the CV+ procedure.
kwargs (dict) โ predict configuration to be passed to the modelโs fit method.
- Raises:
RuntimeError โ no dataset provided.
- get_nonconformity_scores()๏
Get computed nonconformity scores.
- Returns:
computed nonconfomity scores.
- Return type:
ndarray
- predict(X_test, alpha)๏
Conformal interval predictions (w.r.t target miscoverage alpha) for new samples.
- Parameters:
X_test (Iterable) โ features of new samples.
alpha (float) โ target maximum miscoverage.
- Returns:
y_pred, y_lower, y_higher
- Return type:
Tuple[np.ndarray, np.ndarray, np.ndarray]
- class deel.puncc.regression.CQR(predictor, *, train=True, weight_func=None)๏
Conformalized quantile regression method. For more details, we refer the user to the theory overview page.
- Parameters:
predictor (DualPredictor) โ a predictor implementing fit and predict. Must embed two models for lower and upper quantiles estimations respectively.
train (bool) โ if False, prediction model(s) will not be (re)trained. Defaults to True.
weight_func (callable) โ function that takes as argument an array of features X and returns associated โconformalityโ weights, defaults to None.
Example:
from deel.puncc.regression import CQR from deel.puncc.api.prediction import DualPredictor from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from sklearn.ensemble import GradientBoostingRegressor from deel.puncc.metrics import regression_mean_coverage from deel.puncc.metrics import regression_sharpness # Generate a random regression problem X, y = make_regression(n_samples=1000, n_features=4, n_informative=2, random_state=0, shuffle=False) # Split data into train and test X, X_test, y, y_test = train_test_split( X, y, test_size=.2, random_state=0 ) # Split train data into fit and calibration X_fit, X_calib, y_fit, y_calib = train_test_split( X, y, test_size=.2, random_state=0 ) # Lower quantile regressor regressor_q_low = GradientBoostingRegressor( loss="quantile", alpha=.2/2, n_estimators=250 ) # Upper quantile regressor regressor_q_hi = GradientBoostingRegressor( loss="quantile", alpha=1 - .2/2, n_estimators=250 ) # Wrap models in predictor predictor = DualPredictor(models=[regressor_q_low, regressor_q_hi]) # CP method initialization crq = CQR(predictor) # The call to `fit` trains the model and computes the nonconformity # scores on the calibration set crq.fit(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib) # The predict method infers prediction intervals with respect to # the significance level alpha = 20% y_pred, y_pred_lower, y_pred_upper = crq.predict(X_test, alpha=.2) # Compute marginal coverage and average width of the prediction intervals coverage = regression_mean_coverage(y_test, y_pred_lower, y_pred_upper) width = regression_sharpness(y_pred_lower=y_pred_lower, y_pred_upper=y_pred_upper) print(f"Marginal coverage: {np.round(coverage, 2)}") print(f"Average width: {np.round(width, 2)}")
- fit(*, X=None, y=None, fit_ratio=0.8, X_fit=None, y_fit=None, X_calib=None, y_calib=None, use_cached=False, **kwargs)๏
This method fits the models on the fit data and computes nonconformity scores on calibration data. If (X, y) are provided, randomly split data into fit and calib subsets w.r.t to the fit_ratio. In case (X_fit, y_fit) and (X_calib, y_calib) are provided, the conformalization is performed on the given user defined fit and calibration sets.
Note
If X and y are provided, fit ignores any user-defined fit/calib split.
- Parameters:
X (Iterable) โ features from the training dataset.
y (Iterable) โ labels from the training dataset.
fit_ratio (float) โ the proportion of samples assigned to the fit subset.
X_fit (Iterable) โ features from the fit dataset.
y_fit (Iterable) โ labels from the fit dataset.
X_calib (Iterable) โ features from the calibration dataset.
y_calib (Iterable) โ labels from the calibration dataset.
use_cached (bool) โ if set, enables to add the previously computed nonconformity scores (if any) to the pool estimated in the current call to fit. The aggregation follows the CV+ procedure.
kwargs (dict) โ predict configuration to be passed to the modelโs fit method.
- Raises:
RuntimeError โ no dataset provided.
- get_nonconformity_scores()๏
Get computed nonconformity scores.
- Returns:
computed nonconfomity scores.
- Return type:
ndarray
- predict(X_test, alpha)๏
Conformal interval predictions (w.r.t target miscoverage alpha) for new samples.
- Parameters:
X_test (Iterable) โ features of new samples.
alpha (float) โ target maximum miscoverage.
- Returns:
y_pred, y_lower, y_higher
- Return type:
Tuple[np.ndarray, np.ndarray, np.ndarray]
- class deel.puncc.regression.CVPlus(predictor, *, K, random_state=None)๏
Cross-validation plus method. For more details, we refer the user to the theory overview page.
- Parameters:
predictor (BasePredictor) โ a predictor implementing fit and predict.
K (int) โ number of training/calibration folds.
random_state (int) โ seed to control random folds.
Example:
from deel.puncc.regression import CVPlus from deel.puncc.api.prediction import BasePredictor from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestRegressor from deel.puncc.metrics import regression_mean_coverage from deel.puncc.metrics import regression_sharpness # Generate a random regression problem X, y = make_regression(n_samples=1000, n_features=4, n_informative=2, random_state=0, shuffle=False) # Split data into train and test X, X_test, y, y_test = train_test_split( X, y, test_size=.2, random_state=0 ) # Create a random forest model and wrap it in a predictor rf_model = RandomForestRegressor(n_estimators=100, random_state=0) rf_predictor = BasePredictor(rf_model, is_trained=False) # CP method initialization cv_cp = CVPlus(rf_predictor, K=20, random_state=0) # The call to `fit` trains the model and computes the nonconformity # scores on the K-fold calibration sets cv_cp.fit(X, y) # The predict method infers prediction intervals with respect to # the significance level alpha = 20% y_pred, y_pred_lower, y_pred_upper = cv_cp.predict(X_test, alpha=.2) # Compute marginal coverage and average width of the prediction intervals coverage = regression_mean_coverage(y_test, y_pred_lower, y_pred_upper) width = regression_sharpness(y_pred_lower=y_pred_lower, y_pred_upper=y_pred_upper) print(f"Marginal coverage: {np.round(coverage, 2)}") print(f"Average width: {np.round(width, 2)}")
- fit(X, y, use_cached=False, **kwargs)๏
This method fits the ensemble models based on the K-fold scheme. The out-of-bag folds are used to computes residuals on (X_calib, y_calib).
- Parameters:
X (Iterable) โ features from the train dataset.
y (Iterable) โ labels from the train dataset.
use_cached (bool) โ if set, enables to add the previously computed nonconformity scores (if any) to the pool estimated in the current call to fit. The aggregation follows the CV+ procedure.
kwargs (dict) โ predict configuration to be passed to the modelโs predict method.
- get_nonconformity_scores()๏
Get computed nonconformity scores per kfold.
- Returns:
computed nonconfomity scores per kfold.
- Return type:
dict
- predict(X_test, alpha)๏
Conformal interval predictions (w.r.t target miscoverage alpha) for new samples.
- Parameters:
X_test (Iterable) โ features of new samples.
alpha (float) โ target maximum miscoverage.
- Returns:
y_pred, y_lower, y_higher
- Return type:
Tuple[ndarray]
- class deel.puncc.regression.EnbPI(predictor, B, agg_func_loo=<function mean>, random_state=None)๏
Ensemble batch prediction intervals method
- Parameters:
predictor (BasePredictor) โ object implementing โ.fit()โ and โ.predict()โ methods
B (int) โ number of bootstrap models
agg_func_loo (func) โ aggregation function of LOO estimators.
random_state (int) โ determines random generation.
Note
Xu et al. defined two aggregation functions of leave-one-out estimators:
Example:
import numpy as np from deel.puncc.regression import EnbPI from deel.puncc.api.prediction import BasePredictor from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestRegressor from deel.puncc.metrics import regression_mean_coverage from deel.puncc.metrics import regression_sharpness # Generate a random regression problem X, y = make_regression(n_samples=1000, n_features=4, n_informative=2, random_state=0, shuffle=False) # Split data into train and test X, X_test, y, y_test = train_test_split( X, y, test_size=.2, random_state=0 ) # Create rf regressor rf_model = RandomForestRegressor(n_estimators=100, random_state=0) # Wrap model in a predictor rf_predictor = BasePredictor(rf_model) # CP method initialization enbpi = EnbPI( rf_predictor, B=30, agg_func_loo=np.mean, random_state=0, ) # The call to `fit` trains the model and computes the nonconformity # scores on the oob calibration sets enbpi.fit(X, y) # The predict method infers prediction intervals with respect to # the significance level alpha = 20% y_pred, y_pred_lower, y_pred_upper = enbpi.predict( X_test, alpha=.2, y_true=y_test, s=None ) # Compute marginal coverage and average width of the prediction intervals coverage = regression_mean_coverage(y_test, y_pred_lower, y_pred_upper) width = regression_sharpness(y_pred_lower=y_pred_lower, y_pred_upper=y_pred_upper)
- fit(X, y, **kwargs)๏
Fit B bootstrap models on the bootstrap bags and respectively compute/store residuals on out-of-bag samples.
- Parameters:
X (ndarray) โ training feature set
y (ndarray) โ training label set
kwargs (dict) โ fit arguments for the underlying model
- Raises:
RuntimeError โ empty out-of-bag.
- predict(X_test, alpha=0.1, y_true=None, s=None)๏
Estimate conditional mean and interval prediction.
- Parameters:
X_test (ndarray) โ features of new samples.
y_true (ndarray) โ if not None, residuals update based on seasonality is performed.
alpha (float) โ target maximum miscoverage.
s (int) โ Number of online samples necessary to update the residuals sequence.
- Returns:
A tuple composed of y_pred (conditional mean), y_pred_lower (lower PI bound) and y_pred_upper (upper PI bound).
- Return type:
Tuple[ndarray]
- class deel.puncc.regression.AdaptiveEnbPI(predictor, B, agg_func_loo=<function mean>, random_state=None)๏
Locally adaptive version ensemble batch prediction intervals method.
- Parameters:
predictor (MeanVarPredictor) โ object implementing โ.fit()โ and โ.predict()โ methods
B (int) โ number of bootstrap models
agg_func_loo (func) โ aggregation function of LOO estimators.
random_state (int) โ determines random generation.
Note
Xu et al. defined two aggregation functions of leave-one-out estimators:
Example:
from deel.puncc.regression import AdaptiveEnbPI from deel.puncc.api.prediction import MeanVarPredictor from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestRegressor from deel.puncc.metrics import regression_mean_coverage from deel.puncc.metrics import regression_sharpness # Generate a random regression problem X, y = make_regression(n_samples=1000, n_features=4, n_informative=2, random_state=0, shuffle=False) # Split data into train and test X, X_test, y, y_test = train_test_split( X, y, test_size=.2, random_state=0 ) # Create two models mu (mean) and sigma (dispersion) mean_model = RandomForestRegressor(n_estimators=100, random_state=0) sigma_model = RandomForestRegressor(n_estimators=100, random_state=0) # Wrap models in a mean/variance predictor mean_var_predictor = MeanVarPredictor([mean_model, sigma_model]) # CP method initialization aenbpi = AdaptiveEnbPI( mean_var_predictor, B=30, agg_func_loo=np.mean, random_state=0, ) # The call to `fit` trains the model and computes the nonconformity # scores on the oob calibration sets aenbpi.fit(X, y) # The predict method infers prediction intervals with respect to # the significance level alpha = 20% y_pred, y_pred_lower, y_pred_upper = aenbpi.predict( X_test, alpha=.2, y_true=y_test, s=None ) # Compute marginal coverage and average width of the prediction intervals coverage = regression_mean_coverage(y_test, y_pred_lower, y_pred_upper) width = regression_sharpness(y_pred_lower=y_pred_lower, y_pred_upper=y_pred_upper)
- fit(X, y, **kwargs)๏
Fit B bootstrap models on the bootstrap bags and respectively compute/store residuals on out-of-bag samples.
- Parameters:
X (ndarray) โ training feature set
y (ndarray) โ training label set
kwargs (dict) โ fit arguments for the underlying model
- Raises:
RuntimeError โ empty out-of-bag.
- predict(X_test, alpha=0.1, y_true=None, s=None)๏
Estimate conditional mean and interval prediction.
- Parameters:
X_test (ndarray) โ features of new samples.
y_true (ndarray) โ if not None, residuals update based on seasonality is performed.
alpha (float) โ target maximum miscoverage.
s (int) โ Number of online samples necessary to update the residuals sequence.
- Returns:
A tuple composed of y_pred (conditional mean), y_pred_lower (lower PI bound) and y_pred_upper (upper PI bound).
- Return type:
Tuple[ndarray]