๐ Classification๏
Currently implemented conformal prediction methods for classification are listed in this page.
Each of the wrappers conformalize models that are passed as argument in the
object constructor. Such models need to implement the fit()
and predict()
methods.
Prediction module from the API ensures the
compliance of models from various ML/DL libraries (such as Keras and scikit-learn) to puncc.
- class deel.puncc.classification.RAPS(predictor, train=True, random_state=None, lambd=0, k_reg=1, rand=True)๏
Implementation of Regularized Adaptive Prediction Sets (RAPS). The hyperparameters \(\lambda\) and \(k_{reg}\) are used to encourage small prediction sets. For more details, we refer the user to the theory overview page.
- Parameters:
predictor (BasePredictor) โ a predictor implementing fit and predict.
train (bool) โ if False, prediction model(s) will not be trained and will be used as is. Defaults to True.
random_state (float) โ random seed used when the user does not provide a custom fit/calibration split in fit method.
lambd (float) โ positive weight associated to the regularization term that encourages small set sizes. If \(\lambda = 0\), there is no regularization and the implementation identifies with APS.
k_reg (float) โ class rank (ordered by descending probability) starting from which the regularization is applied. For example, if \(k_{reg} = 3\), then the fourth most likely estimated class has an extra penalty of size \(\lambda\).
rand (bool) โ turn on or off randomization used in raps algorithm. One consequence of turning off randomization is avoiding empty prediction sets.
Note
If \(\lambda = 0\), there is no regularization and the implementation identifies with APS.
Example:
from deel.puncc.classification import RAPS from deel.puncc.api.prediction import BasePredictor from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from deel.puncc.metrics import classification_mean_coverage from deel.puncc.metrics import classification_mean_size import numpy as np from tensorflow.keras.utils import to_categorical # Generate a random regression problem X, y = make_classification(n_samples=1000, n_features=4, n_informative=2, n_classes = 2,random_state=0, shuffle=False) # Split data into train and test X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=.2, random_state=0 ) # Split train data into fit and calibration X_fit, X_calib, y_fit, y_calib = train_test_split( X_train, y_train, test_size=.2, random_state=0 ) # One hot encoding of classes y_fit_cat = to_categorical(y_fit) y_calib_cat = to_categorical(y_calib) y_test_cat = to_categorical(y_test) # Create rf classifier rf_model = RandomForestClassifier(n_estimators=100, random_state=0) # Create a wrapper of the random forest model to redefine its predict method # into logits predictions. Make sure to subclass BasePredictor. # Note that we needed to build a new wrapper (over BasePredictor) only because # the predict(.) method of RandomForestClassifier does not predict logits. # Otherwise, it is enough to use BasePredictor (e.g., neural network with softmax). class RFPredictor(BasePredictor): def predict(self, X, **kwargs): return self.model.predict_proba(X, **kwargs) # Wrap model in the newly created RFPredictor rf_predictor = RFPredictor(rf_model) # CP method initialization raps_cp = RAPS(rf_predictor, k_reg=2, lambd=1) # The call to `fit` trains the model and computes the nonconformity # scores on the calibration set raps_cp.fit(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib) # The predict method infers prediction intervals with respect to # the significance level alpha = 20% y_pred, set_pred = raps_cp.predict(X_test, alpha=.2) # Compute marginal coverage coverage = classification_mean_coverage(y_test, set_pred) size = classification_mean_size(set_pred) print(f"Marginal coverage: {np.round(coverage, 2)}") print(f"Average prediction set size: {np.round(size, 2)}")
- fit(*, X=None, y=None, fit_ratio=0.8, X_fit=None, y_fit=None, X_calib=None, y_calib=None, **kwargs)๏
This method fits the models on the fit data and computes nonconformity scores on calibration data. If (X, y) are provided, randomly split data into fit and calib subsets w.r.t to the fit_ratio. In case (X_fit, y_fit) and (X_calib, y_calib) are provided, the conformalization is performed on the given user defined fit and calibration sets.
Note
If X and y are provided, fit ignores any user-defined fit/calib split.
- Parameters:
X (Iterable) โ features from the training dataset.
y (Iterable) โ labels from the training dataset.
fit_ratio (float) โ the proportion of samples assigned to the fit subset.
X_fit (Iterable) โ features from the fit dataset.
y_fit (Iterable) โ labels from the fit dataset.
X_calib (Iterable) โ features from the calibration dataset.
y_calib (Iterable) โ labels from the calibration dataset.
kwargs (dict) โ predict configuration to be passed to the modelโs fit method.
- Raises:
RuntimeError โ no dataset provided.
- predict(X_test, alpha)๏
Conformal interval predictions (w.r.t target miscoverage alpha) for new samples.
- Parameters:
X_test (Iterable) โ features of new samples.
alpha (float) โ target maximum miscoverage.
- Returns:
Tuple composed of the model estimate y_pred and the prediction set set_pred
- Return type:
Tuple
- class deel.puncc.classification.APS(predictor, train=True, rand=True)๏
Implementation of Adaptive Prediction Sets (APS). For more details, we refer the user to the theory overview page.
- Parameters:
predictor (BasePredictor) โ a predictor implementing fit and predict.
train (bool) โ if False, prediction model(s) will not be trained and will be used as is. Defaults to True.
rand (bool) โ turn on or off randomization used in aps algorithm. One consequence of turning off randomization is avoiding empty prediction sets.
Example:
from deel.puncc.classification import APS from deel.puncc.api.prediction import BasePredictor from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from deel.puncc.metrics import classification_mean_coverage from deel.puncc.metrics import classification_mean_size import numpy as np from tensorflow.keras.utils import to_categorical # Generate a random regression problem X, y = make_classification(n_samples=1000, n_features=4, n_informative=2, n_classes = 2,random_state=0, shuffle=False) # Split data into train and test X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=.2, random_state=0 ) # Split train data into fit and calibration X_fit, X_calib, y_fit, y_calib = train_test_split( X_train, y_train, test_size=.2, random_state=0 ) # One hot encoding of classes y_fit_cat = to_categorical(y_fit) y_calib_cat = to_categorical(y_calib) y_test_cat = to_categorical(y_test) # Create rf classifier rf_model = RandomForestClassifier(n_estimators=100, random_state=0) # Create a wrapper of the random forest model to redefine its predict method # into logits predictions. Make sure to subclass BasePredictor. # Note that we needed to build a new wrapper (over BasePredictor) only because # the predict(.) method of RandomForestClassifier does not predict logits. # Otherwise, it is enough to use BasePredictor (e.g., neural network with softmax). class RFPredictor(BasePredictor): def predict(self, X, **kwargs): return self.model.predict_proba(X, **kwargs) # Wrap model in the newly created RFPredictor rf_predictor = RFPredictor(rf_model) # CP method initialization aps_cp = APS(rf_predictor) # The call to `fit` trains the model and computes the nonconformity # scores on the calibration set aps_cp.(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib) # The predict method infers prediction intervals with respect to # the significance level alpha = 20% y_pred, set_pred = aps_cp.predict(X_test, alpha=.2) # Compute marginal coverage coverage = classification_mean_coverage(y_test, set_pred) size = classification_mean_size(set_pred) print(f"Marginal coverage: {np.round(coverage, 2)}") print(f"Average prediction set size: {np.round(size, 2)}")
- fit(*, X=None, y=None, fit_ratio=0.8, X_fit=None, y_fit=None, X_calib=None, y_calib=None, **kwargs)๏
This method fits the models on the fit data and computes nonconformity scores on calibration data. If (X, y) are provided, randomly split data into fit and calib subsets w.r.t to the fit_ratio. In case (X_fit, y_fit) and (X_calib, y_calib) are provided, the conformalization is performed on the given user defined fit and calibration sets.
Note
If X and y are provided, fit ignores any user-defined fit/calib split.
- Parameters:
X (Iterable) โ features from the training dataset.
y (Iterable) โ labels from the training dataset.
fit_ratio (float) โ the proportion of samples assigned to the fit subset.
X_fit (Iterable) โ features from the fit dataset.
y_fit (Iterable) โ labels from the fit dataset.
X_calib (Iterable) โ features from the calibration dataset.
y_calib (Iterable) โ labels from the calibration dataset.
kwargs (dict) โ predict configuration to be passed to the modelโs fit method.
- Raises:
RuntimeError โ no dataset provided.
- predict(X_test, alpha)๏
Conformal interval predictions (w.r.t target miscoverage alpha) for new samples.
- Parameters:
X_test (Iterable) โ features of new samples.
alpha (float) โ target maximum miscoverage.
- Returns:
Tuple composed of the model estimate y_pred and the prediction set set_pred
- Return type:
Tuple