Splitting

Helpers to define how to assign data into fit and calibration sets.

class splitting.BaseSplitter(random_state=None)

Abstract structure of a splitter. A splitter provides a function that assignes data points to fit and calibration sets.

Parameters:

random_state (int) – seed to control random generation.

class splitting.IdSplitter(X_fit, y_fit, X_calib, y_calib)

Identity splitter that wraps an already existing data assignment.

Parameters:
  • X_fit (Iterable) – Fit features.

  • y_fit (Iterable) – Fit labels.

  • X_calib (Iterable) – calibration features.

  • y_calib (Iterable) – calibration labels.

__call__(X=None, y=None)

Wraps into a splitter the provided fit and calibration subsets.

Parameters:
  • X (Iterable) – features array. Not needed here, just a placeholder for interoperability.

  • y (Iterable) – labels array. Not needed here, just a placeholder for interoperability.

Returns:

List of one tuple of deterministic subsets (X_fit, y_fit, X_calib, y_calib).

Return type:

List[Tuple[Iterable]]

class splitting.RandomSplitter(ratio, random_state=None)

Random splitter that assign samples given a ratio.

Parameters:
  • ratio (float) – ratio of data assigned to the training (1-ratio to calibration).

  • random_state (int) – seed to control random generation.

__call__(X, y)

Implements a random split strategy.

Parameters:
  • X (Iterable) – features array.

  • y (Iterable) – labels array.

Returns:

List of one tuple of random subsets (X_fit, y_fit, X_calib, y_calib).

Return type:

List[Tuple[Iterable]]

class splitting.KFoldSplitter(K, random_state=None)

KFold data splitter.

Parameters:
  • K (int) – number of folds to generate.

  • random_state (int) – seed to control random generation.

__call__(X, y)

Implements a K-fold split strategy.

Parameters:
  • X (Iterabler) – features array.

  • y (Iterable) – labels array.

Returns:

list of K split folds. Each fold is a tuple (X_fit, y_fit, X_calib, y_calib).

Return type:

List[Tuple[Iterable]]