📊 Classification

Currently implemented conformal prediction methods for classification are listed in this page.

Each of the wrappers conformalize models that are passed as argument in the object constructor. Such models need to implement the fit() and predict() methods. Prediction module from the API ensures the compliance of models from various ML/DL libraries (such as Keras and scikit-learn) to puncc.

class deel.puncc.classification.LAC(predictor, train=True, random_state=None)

Implementation of the Least Ambiguous Set-Valued Classifier (LAC). For more details, we refer the user to the theory overview page.

Parameters:

predictor (BasePredictor) – a predictor implementing fit and predict.
train (bool) – if False, prediction model(s) will not be trained and will be used as is. Defaults to True.

Example:

from deel.puncc.classification import LAC
from deel.puncc.api.prediction import BasePredictor

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

from deel.puncc.metrics import classification_mean_coverage
from deel.puncc.metrics import classification_mean_size

import numpy as np

from tensorflow.keras.utils import to_categorical

# Generate a random regression problem
X, y = make_classification(n_samples=1000, n_features=4, n_informative=2,
            n_classes = 2,random_state=0, shuffle=False)

# Split data into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=.2, random_state=0
)

# Split train data into fit and calibration
X_fit, X_calib, y_fit, y_calib = train_test_split(
    X_train, y_train, test_size=.2, random_state=0
)

# One hot encoding of classes
y_fit_cat = to_categorical(y_fit)
y_calib_cat = to_categorical(y_calib)
y_test_cat = to_categorical(y_test)

# Create rf classifier
rf_model = RandomForestClassifier(n_estimators=100, random_state=0)

# Create a wrapper of the random forest model to redefine its predict method
# into logits predictions. Make sure to subclass BasePredictor.
# Note that we needed to build a new wrapper (over BasePredictor) only because
# the predict(.) method of RandomForestClassifier does not predict logits.
# Otherwise, it is enough to use BasePredictor (e.g., neural network with softmax).
class RFPredictor(BasePredictor):
    def predict(self, X, **kwargs):
        return self.model.predict_proba(X, **kwargs)

# Wrap model in the newly created RFPredictor
rf_predictor = RFPredictor(rf_model)

# CP method initialization
lac_cp = LAC(rf_predictor)

# The call to `fit` trains the model and computes the nonconformity
# scores on the calibration set
lac_cp.fit(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib)


# The predict method infers prediction sets with respect to
# the significance level alpha = 20%
y_pred, set_pred = lac_cp.predict(X_test, alpha=.2)

# Compute marginal coverage
coverage = classification_mean_coverage(y_test, set_pred)
size = classification_mean_size(set_pred)

print(f"Marginal coverage: {np.round(coverage, 2)}")
print(f"Average prediction set size: {np.round(size, 2)}")

fit(*, X=None, y=None, fit_ratio=0.8, X_fit=None, y_fit=None, X_calib=None, y_calib=None, use_cached=False, **kwargs)

This method fits the models on the fit data and computes nonconformity scores on calibration data. If (X, y) are provided, randomly split data into fit and calib subsets w.r.t to the fit_ratio. In case (X_fit, y_fit) and (X_calib, y_calib) are provided, the conformalization is performed on the given user defined fit and calibration sets.

Note

If X and y are provided, fit ignores any user-defined fit/calib split.

Parameters:

X (Iterable) – features from the training dataset.
y (Iterable) – labels from the training dataset.
fit_ratio (float) – the proportion of samples assigned to the fit subset.
X_fit (Iterable) – features from the fit dataset.
y_fit (Iterable) – labels from the fit dataset.
X_calib (Iterable) – features from the calibration dataset.
y_calib (Iterable) – labels from the calibration dataset.
use_cached (bool) – if set, enables to add the previously computed nonconformity scores (if any) to the pool estimated in the current call to fit. The aggregation follows the CV+ procedure.
kwargs (dict) – predict configuration to be passed to the model’s fit method.

Raises:

RuntimeError – no dataset provided.

get_nonconformity_scores()

Get computed nonconformity scores.

Returns:: computed nonconfomity scores.
Return type:: ndarray

predict(X_test, alpha)

Conformal set predictions (w.r.t target miscoverage alpha) for new samples.

Parameters:

X_test (Iterable) – features of new samples.
alpha (float) – target maximum miscoverage.

Returns:

Tuple composed of the model estimate y_pred and the prediction set set_pred

Return type:

Tuple

class deel.puncc.classification.RAPS(predictor, train=True, random_state=None, lambd=0, k_reg=1, rand=True)

Implementation of Regularized Adaptive Prediction Sets (RAPS). The hyperparameters \(\lambda\) and \(k_{reg}\) are used to encourage small prediction sets. For more details, we refer the user to the theory overview page.

Parameters:

predictor (BasePredictor) – a predictor implementing fit and predict.
train (bool) – if False, prediction model(s) will not be trained and will be used as is. Defaults to True.
random_state (float) – random seed used when the user does not provide a custom fit/calibration split in fit method.
lambd (float) – positive weight associated to the regularization term that encourages small set sizes. If \(\lambda = 0\), there is no regularization and the implementation identifies with APS.
k_reg (float) – class rank (ordered by descending probability) starting from which the regularization is applied. For example, if \(k_{reg} = 3\), then the fourth most likely estimated class has an extra penalty of size \(\lambda\).
rand (bool) – turn on or off randomization used in raps algorithm. One consequence of turning off randomization is avoiding empty prediction sets.

Note

If \(\lambda = 0\), there is no regularization and the implementation identifies with APS.

Example:

from deel.puncc.classification import RAPS
from deel.puncc.api.prediction import BasePredictor

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

from deel.puncc.metrics import classification_mean_coverage
from deel.puncc.metrics import classification_mean_size

import numpy as np

from tensorflow.keras.utils import to_categorical


# Generate a random regression problem
X, y = make_classification(n_samples=1000, n_features=4, n_informative=2,
            n_classes = 2,random_state=0, shuffle=False)

# Split data into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=.2, random_state=0
)

# Split train data into fit and calibration
X_fit, X_calib, y_fit, y_calib = train_test_split(
    X_train, y_train, test_size=.2, random_state=0
)

# One hot encoding of classes
y_fit_cat = to_categorical(y_fit)
y_calib_cat = to_categorical(y_calib)
y_test_cat = to_categorical(y_test)

# Create rf classifier
rf_model = RandomForestClassifier(n_estimators=100, random_state=0)

# Create a wrapper of the random forest model to redefine its predict method
# into logits predictions. Make sure to subclass BasePredictor.
# Note that we needed to build a new wrapper (over BasePredictor) only because
# the predict(.) method of RandomForestClassifier does not predict logits.
# Otherwise, it is enough to use BasePredictor (e.g., neural network with softmax).
class RFPredictor(BasePredictor):
    def predict(self, X, **kwargs):
        return self.model.predict_proba(X, **kwargs)

# Wrap model in the newly created RFPredictor
rf_predictor = RFPredictor(rf_model)

# CP method initialization
raps_cp = RAPS(rf_predictor, k_reg=2, lambd=1)

# The call to `fit` trains the model and computes the nonconformity
# scores on the calibration set
raps_cp.fit(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib)


# The predict method infers prediction sets with respect to
# the significance level alpha = 20%
y_pred, set_pred = raps_cp.predict(X_test, alpha=.2)

# Compute marginal coverage
coverage = classification_mean_coverage(y_test, set_pred)
size = classification_mean_size(set_pred)

print(f"Marginal coverage: {np.round(coverage, 2)}")
print(f"Average prediction set size: {np.round(size, 2)}")

fit(*, X=None, y=None, fit_ratio=0.8, X_fit=None, y_fit=None, X_calib=None, y_calib=None, **kwargs)

This method fits the models on the fit data and computes nonconformity scores on calibration data. If (X, y) are provided, randomly split data into fit and calib subsets w.r.t to the fit_ratio. In case (X_fit, y_fit) and (X_calib, y_calib) are provided, the conformalization is performed on the given user defined fit and calibration sets.

Note

If X and y are provided, fit ignores any user-defined fit/calib split.

Parameters:

X (Iterable) – features from the training dataset.
y (Iterable) – labels from the training dataset.
fit_ratio (float) – the proportion of samples assigned to the fit subset.
X_fit (Iterable) – features from the fit dataset.
y_fit (Iterable) – labels from the fit dataset.
X_calib (Iterable) – features from the calibration dataset.
y_calib (Iterable) – labels from the calibration dataset.
kwargs (dict) – predict configuration to be passed to the model’s fit method.

Raises:

RuntimeError – no dataset provided.

predict(X_test, alpha)

Conformal set predictions (w.r.t target miscoverage alpha) for new samples.

Parameters:

X_test (Iterable) – features of new samples.
alpha (float) – target maximum miscoverage.

Returns:

Tuple composed of the model estimate y_pred and the prediction set set_pred

Return type:

Tuple

class deel.puncc.classification.APS(predictor, train=True, rand=True)

Implementation of Adaptive Prediction Sets (APS). For more details, we refer the user to the theory overview page.

Parameters:

predictor (BasePredictor) – a predictor implementing fit and predict.
train (bool) – if False, prediction model(s) will not be trained and will be used as is. Defaults to True.
rand (bool) – turn on or off randomization used in aps algorithm. One consequence of turning off randomization is avoiding empty prediction sets.

Example:

from deel.puncc.classification import APS
from deel.puncc.api.prediction import BasePredictor

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

from deel.puncc.metrics import classification_mean_coverage
from deel.puncc.metrics import classification_mean_size

import numpy as np

from tensorflow.keras.utils import to_categorical

# Generate a random regression problem
X, y = make_classification(n_samples=1000, n_features=4, n_informative=2,
            n_classes = 2,random_state=0, shuffle=False)

# Split data into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=.2, random_state=0
)

# Split train data into fit and calibration
X_fit, X_calib, y_fit, y_calib = train_test_split(
    X_train, y_train, test_size=.2, random_state=0
)

# One hot encoding of classes
y_fit_cat = to_categorical(y_fit)
y_calib_cat = to_categorical(y_calib)
y_test_cat = to_categorical(y_test)

# Create rf classifier
rf_model = RandomForestClassifier(n_estimators=100, random_state=0)

# Create a wrapper of the random forest model to redefine its predict method
# into logits predictions. Make sure to subclass BasePredictor.
# Note that we needed to build a new wrapper (over BasePredictor) only because
# the predict(.) method of RandomForestClassifier does not predict logits.
# Otherwise, it is enough to use BasePredictor (e.g., neural network with softmax).
class RFPredictor(BasePredictor):
    def predict(self, X, **kwargs):
        return self.model.predict_proba(X, **kwargs)

# Wrap model in the newly created RFPredictor
rf_predictor = RFPredictor(rf_model)

# CP method initialization
aps_cp = APS(rf_predictor)

# The call to `fit` trains the model and computes the nonconformity
# scores on the calibration set
aps_cp.(X_fit=X_fit, y_fit=y_fit, X_calib=X_calib, y_calib=y_calib)

# The predict method infers prediction sets with respect to
# the significance level alpha = 20%
y_pred, set_pred = aps_cp.predict(X_test, alpha=.2)

# Compute marginal coverage
coverage = classification_mean_coverage(y_test, set_pred)
size = classification_mean_size(set_pred)

print(f"Marginal coverage: {np.round(coverage, 2)}")
print(f"Average prediction set size: {np.round(size, 2)}")

fit(*, X=None, y=None, fit_ratio=0.8, X_fit=None, y_fit=None, X_calib=None, y_calib=None, **kwargs)

This method fits the models on the fit data and computes nonconformity scores on calibration data. If (X, y) are provided, randomly split data into fit and calib subsets w.r.t to the fit_ratio. In case (X_fit, y_fit) and (X_calib, y_calib) are provided, the conformalization is performed on the given user defined fit and calibration sets.

Note

If X and y are provided, fit ignores any user-defined fit/calib split.

Parameters:

X (Iterable) – features from the training dataset.
y (Iterable) – labels from the training dataset.
fit_ratio (float) – the proportion of samples assigned to the fit subset.
X_fit (Iterable) – features from the fit dataset.
y_fit (Iterable) – labels from the fit dataset.
X_calib (Iterable) – features from the calibration dataset.
y_calib (Iterable) – labels from the calibration dataset.
kwargs (dict) – predict configuration to be passed to the model’s fit method.

Raises:

RuntimeError – no dataset provided.

predict(X_test, alpha)

Conformal set predictions (w.r.t target miscoverage alpha) for new samples.

Parameters:

X_test (Iterable) – features of new samples.
alpha (float) – target maximum miscoverage.

Returns:

Tuple composed of the model estimate y_pred and the prediction set set_pred

Return type:

Tuple