๐ฉ Anomaly detection๏
Currently implemented conformal anomaly detectors are listed in this page.
Each of these wrappers calibrate the decision threshold for anomaly detectors
that are passed as argument in the object constructor. Such models need to
implement the fit()
and predict()
methods.
Prediction module from the API ensures the
compliance of models from various ML/DL libraries (such as Keras and scikit-learn) to puncc.
- class deel.puncc.anomaly_detection.SplitCAD(predictor, *, train=True, random_state=None)๏
Split conformal anomaly detection method based on Laxhammarโs algorithm. The anomaly detection is based on the calibrated threshold (through conformal prediction) of underlying anomaly detection (modelโs) scores. For more details, we refer the user to the theory overview page.
- Parameters:
predictor (BasePredictor) โ a predictor implementing fit and predict.
train (bool) โ if False, prediction model(s) will not be (re)trained. Defaults to True.
random_state (float) โ random seed used when the user does not provide a custom fit/calibration split in fit method.
Example:
import numpy as np from sklearn.ensemble import IsolationForest from sklearn.datasets import make_moons import matplotlib.pyplot as plt from deel.puncc.anomaly_detection import SplitCAD from deel.puncc.api.prediction import BasePredictor # We generate the two moons dataset dataset = 4 * make_moons(n_samples=1000, noise=0.05, random_state=0)[ 0 ] - np.array([0.5, 0.25]) # We generate uniformly new (test) data points rng = np.random.RandomState(42) z_test = rng.uniform(low=-6, high=6, size=(150, 2)) # The nonconformity scores are defined as the IF scores (anomaly score). # By default, score_samples return the opposite of IF scores. # We need to redefine the predict to output the nonconformity scores. class ADPredictor(BasePredictor): def predict(self, X): return -self.model.score_samples(X) # Instantiate the Isolation Forest (IF) anomaly detection model # and wrap it in a predictor if_predictor = ADPredictor(IsolationForest(random_state=42)) # Instantiate CAD on top of IF predictor if_cad = SplitCAD(if_predictor, train=True, random_state=0) # Fit the IF on the proper fitting dataset and # calibrate it using calibration dataset. # The two datasets are sampled randomly with a ration of 7:3, # respectively. if_cad.fit(z=dataset, fit_ratio=0.7) # We set the maximum false detection rate to 1% alpha = 0.01 # The method `predict` is called on the new data points # to test which are anomalous and which are not results = if_cad.predict(z_test, alpha=alpha) anomalies = z_test[results] not_anomalies = z_test[np.invert(results)] # Plot results plt.scatter(dataset[:, 0], dataset[:, 1], s=10, label="Inliers") plt.scatter( anomalies[:, 0], anomalies[:, 1], marker="x", color="red", s=40, label="Anomalies", ) plt.scatter( not_anomalies[:, 0], not_anomalies[:, 1], marker="x", color="blue", s=40, label="Normal", ) plt.xticks(()) plt.yticks(()) plt.legend()
- fit(*, z=None, fit_ratio=0.8, z_fit=None, z_calib=None, **kwargs)๏
This method fits the models on the fit data and computes nonconformity scores on calibration data. If z are provided, randomly split data into fit and calib subsets w.r.t to the fit_ratio. In case z_fit and z_calib are provided, the conformalization is performed on the given user defined fit and calibration sets.
Note
If z is provided, fit ignores any user-defined fit/calib split.
- Parameters:
z (Iterable) โ data points from the training dataset.
fit_ratio (float) โ the proportion of samples assigned to the fit subset.
z_fit (Iterable) โ data points from the fit dataset.
z_calib (Iterable) โ data points from the calibration dataset.
kwargs (dict) โ predict configuration to be passed to the modelโs fit method.
- Raises:
RuntimeError โ no dataset provided.
- predict(z_test, alpha)๏
Predict whether each example is an anomaly or not. The decision is taken based on the calibrated threshold (through conformal prediction) of underlying anomaly detection scores.
- Parameters:
z_test (Iterable) โ new data points.
alpha (float) โ target maximum FDR.
- Returns:
outlier tag. True if outlier, False otherwise.
- Return type:
Iterables[bool]