deel.datasets package

deel.datasets.load(dataset, mode=None, version='latest', force_update=False, with_info=False, settings=None, **kwargs)

Load the given dataset using the given arguments.

Parameters
  • dataset (str) – Dataset to load.

  • mode (Optional[str]) – Mode to use. The “path” mode is always available and will simply returns the path to the local dataset. Each dataset have its own sets of available modes.

  • version (str) – Version of the dataset.

  • force_update (bool) – Force update of the local dataset if possible.

  • with_info (bool) – Returns information about the dataset alongside the actual dataset(s).

  • settings (Optional[Settings]) – Settings to use to load the dataset.

  • **kwargs – Extra arguments for the given dataset and mode.

Return type

Any

Returns

The dataset in the format specified by mode.

Raises
  • DatasetNotFoundError – If the dataset does not exist.

  • ImportError – If the plugin could not be loaded.

Subpackages

Submodules

deel.datasets.dataset module

class deel.datasets.dataset.BaseDataset(name, version='latest', settings=None)

Bases: object

Base dataset for all dataset types.

Creates a new dataset of the given name and version.

Parameters
  • name (str) – Name of the dataset.

  • version (str) – Version of the dataset.

  • settings (Optional[Settings]) – The settings to use for this dataset, or None to use the

  • settings. (default) –

property available_modes: List[str]

Retrieve the list of available modes for this dataset.

Return type

List[str]

Returns

The list of available modes for this dataset.

property default_mode: str

Retrieve the default mode for this dataset.

Return type

str

Returns

The default mode for this dataset.

abstract load(mode=None, with_info=False, **kwargs)

Load this dataset as specified by mode.

Parameters
  • mode (Optional[str]) – Mode to load the dataset, or None to use the default mode.

  • with_info (bool) – Returns information about the dataset alongside the actual dataset(s).

  • **kwargs – Extra arguments for the specific mode.

Return type

Any

Returns

The dataset as specified by mode and the given extra arguments.

Raises

InvalidModeError – If the given mode is not available for this dataset.

property name: str

The name of the dataset.

Type

Returns

Return type

str

property version: str

The requested version of the dataset.

Type

Returns

Return type

str

class deel.datasets.dataset.Dataset(name, version='latest', settings=None)

Bases: BaseDataset

Dataset is the base class for all DEEL dataset and can be used as a non-specific dataset handler.

A Dataset object can be extended to easily interface with the local file system to access datasets files using the load method.

A dataset can be loaded using different modes (see available_modes and default_mode). Inheriting classes can add extra modes by providing load_MODE method and overriding _default_mode.

Example

Basic usage of the Dataset class is via the load method.

>>> dataset = Dataset("blink")
>>> dataset.load()
PosixPath('/home/username/.deel/datasets/blink/3.0.1')

Creates a new dataset of the given name and version.

Parameters
  • name (str) – Name of the dataset.

  • version (str) – Version of the dataset.

  • settings (Optional[Settings]) – The settings to use for this dataset, or None to use the

  • settings. (default) –

load(mode=None, with_info=False, force_update=False, **kwargs)

Load this dataset as specified by mode.

This method checks that the given mode is valid, retrieve the dataset files using a Provider and then dispatches the actual loading of the data to a load_MODE method.

If this dataset consists of a single file as specified by _single_file, the path used will be the one of this file, otherwise, the folder will be used.

Parameters
  • mode (Optional[str]) – Mode to load the dataset, or None to use the default mode.

  • force_update (bool) – Force update of the dataset if possible.

  • with_info (bool) – Returns information about the dataset alongside the actual dataset(s).

  • **kwargs – Extra arguments for the specific mode.

Return type

Any

Returns

The dataset as specified by mode and the given extra arguments.

Raises

InvalidModeError – If the given mode is not available for this dataset.

load_path(path)

Load method for path mode.

Parameters

path (Path) – Path of the dataset.

Return type

Path

Returns

The actual path to the dataset.

exception deel.datasets.dataset.InvalidModeError(dataset, mode)

Bases: Exception

Exception raised when a mode is not available for a given dataset.

Parameters
  • dataset (BaseDataset) – Dataset for which the mode is not available.

  • mode (str) – Mode not available.

class deel.datasets.dataset.VolatileDataset(name, version='latest', settings=None)

Bases: BaseDataset

Dataset that are generated on-the-fly.

Creates a new dataset of the given name and version.

Parameters
  • name (str) – Name of the dataset.

  • version (str) – Version of the dataset.

  • settings (Optional[Settings]) – The settings to use for this dataset, or None to use the

  • settings. (default) –

load(mode=None, with_info=False, **kwargs)

Load this dataset as specified by mode.

This method checks that the given mode is valid and generates the dataset using the given load_MODE method.

Parameters
  • mode (Optional[str]) – Mode to load the dataset, or None to use the default mode.

  • with_info (bool) – Returns information about the dataset alongside the actual dataset(s).

  • **kwargs – Extra arguments for the specific mode.

Return type

Any

Returns

The dataset as specified by mode and the given extra arguments.

Raises

InvalidModeError – If the given mode is not available for this dataset.

abstract load_basic()

Load method for path mode.

Parameters

path – Path of the dataset.

Returns

The actual path to the dataset.

deel.datasets.settings module

exception deel.datasets.settings.ParseSettingsError

Bases: Exception

Exception raised if an issue occurs while parsing the settings.

class deel.datasets.settings.Settings(version, provider_list, path, default_provider='')

Bases: object

The Settings class is a read-only class that contains settings for the deel.datasets package.

Settings are stored in a YAML format. The default location for the settings file is $HOME/.deel/config.yml. The DEEL_DATASETS_CONF environment variable can be used to specify the default location of the file.

Parameters
  • version (int) – Version of the settings.

  • provider_type – Type of the provider.

  • provider_options – Options for the provider.

  • path (Path) – Local storage path for the datasets.

get_best_provider(dataset)

Searchs and returns the best settings provider. If the defauit provider is defined in the configuration file, it is returned. If not and if the dataset is not None, searchs and returns the first settings provider which contains this dataset. If not, returns the local settings provider. :type dataset: str :param dataset: dataset name

Return type

SettingsProvider

Returns

The provider to use

get_provider_list()
Return type

Dict[str, SettingsProvider]

property local_storage: Path

The path to the local storage for the datasets.

Type

Returns

Return type

Path

make_provider(dataset='')

Creates and returns the provider corresponding to these settings. :type dataset: str :param dataset: dataset name

Return type

Provider

Returns

A new Provider created from these settings.

class deel.datasets.settings.SettingsProvider(provider_type, provider_options)

Bases: object

Parameters
  • provider_type (str) – Type of the provider.

  • provider_options (Dict[str, Any]) – Options for the provider.

create_provider(base)

Creates and returns the provider corresponding to those configurations. :type base: Path :param base: path root directory

Return type

Provider

Returns

A new Provider created from these settings.

deel.datasets.settings.get_default_settings(default_provider='')

Retrieve the default settings for the current machine.

Parameters

default_provider (str) – optional the default provider to use

Return type

Settings

Returns

The default settings for the current machine.

deel.datasets.settings.get_settings_for_local()

Retrieve the local default settings.

Return type

Settings

Returns

The settings for local.

deel.datasets.settings.read_one_provider(data, version)

Load Settings from the given dictionnary (YAML stream).

Parameters

data (Dict[str, Any]) – YAML file settings element dictionnary.

Return type

SettingsProvider

Returns

A Settings object constructed from the given data.

Raises
  • yaml.YAMLError – If the given stream does not contain valid YAML.

  • ParseSettingsError – If the given YAML is not valid for settings.

deel.datasets.settings.read_settings(stream, default_provider='')

Load Settings from the given YAML stream.

Parameters
  • stream (TextIO) – File-like object containing the configuration.

  • default_provider (str) – default provider to use

Return type

Settings

Returns

A Settings object constructed from the given YAML stream.

Raises
  • yaml.YAMLError – If the given stream does not contain valid YAML.

  • ParseSettingsError – If the given YAML is not valid for settings.

deel.datasets.settings.write_settings(settings, stream, **kwargs)

Write the given Settings to the given stream as YAML.

Parameters
  • settings (Settings) – Settings to write.

  • stream (TextIO) – File-like object where the configuration will be written.

  • **kwargs – Extra arguments for the yaml.safe_dump method.