deel.datasets package
- deel.datasets.load(dataset, mode=None, version='latest', force_update=False, with_info=False, settings=None, **kwargs)
Load the given dataset using the given arguments.
- Parameters
dataset (
str) – Dataset to load.mode (
Optional[str]) – Mode to use. The “path” mode is always available and will simply returns the path to the local dataset. Each dataset have its own sets of available modes.version (
str) – Version of the dataset.force_update (
bool) – Force update of the local dataset if possible.with_info (
bool) – Returns information about the dataset alongside the actual dataset(s).settings (
Optional[Settings]) – Settings to use to load the dataset.**kwargs – Extra arguments for the given dataset and mode.
- Return type
Any- Returns
The dataset in the format specified by mode.
- Raises
DatasetNotFoundError – If the dataset does not exist.
ImportError – If the plugin could not be loaded.
Subpackages
- deel.datasets.providers package
- Submodules
- deel.datasets.providers.exceptions module
- deel.datasets.providers.ftp_providers module
- deel.datasets.providers.gcloud_provider module
- deel.datasets.providers.http_providers module
- deel.datasets.providers.local_as_provider module
- deel.datasets.providers.local_provider module
- deel.datasets.providers.provider module
- deel.datasets.providers.remote_provider module
- deel.datasets.providers.webdav_provider module
- deel.datasets.utils package
Submodules
deel.datasets.dataset module
- class deel.datasets.dataset.BaseDataset(name, version='latest', settings=None)
Bases:
objectBase dataset for all dataset types.
Creates a new dataset of the given name and version.
- Parameters
name (
str) – Name of the dataset.version (
str) – Version of the dataset.settings (
Optional[Settings]) – The settings to use for this dataset, or None to use thesettings. (default) –
- property available_modes: List[str]
Retrieve the list of available modes for this dataset.
- Return type
List[str]- Returns
The list of available modes for this dataset.
- property default_mode: str
Retrieve the default mode for this dataset.
- Return type
str- Returns
The default mode for this dataset.
- abstract load(mode=None, with_info=False, **kwargs)
Load this dataset as specified by mode.
- Parameters
mode (
Optional[str]) – Mode to load the dataset, or None to use the default mode.with_info (
bool) – Returns information about the dataset alongside the actual dataset(s).**kwargs – Extra arguments for the specific mode.
- Return type
Any- Returns
The dataset as specified by mode and the given extra arguments.
- Raises
InvalidModeError – If the given mode is not available for this dataset.
- property name: str
The name of the dataset.
- Type
Returns
- Return type
str
- property version: str
The requested version of the dataset.
- Type
Returns
- Return type
str
- class deel.datasets.dataset.Dataset(name, version='latest', settings=None)
Bases:
BaseDatasetDataset is the base class for all DEEL dataset and can be used as a non-specific dataset handler.
A Dataset object can be extended to easily interface with the local file system to access datasets files using the load method.
A dataset can be loaded using different modes (see available_modes and default_mode). Inheriting classes can add extra modes by providing load_MODE method and overriding _default_mode.
Example
Basic usage of the Dataset class is via the load method.
>>> dataset = Dataset("blink") >>> dataset.load() PosixPath('/home/username/.deel/datasets/blink/3.0.1')
Creates a new dataset of the given name and version.
- Parameters
name (
str) – Name of the dataset.version (
str) – Version of the dataset.settings (
Optional[Settings]) – The settings to use for this dataset, or None to use thesettings. (default) –
- load(mode=None, with_info=False, force_update=False, **kwargs)
Load this dataset as specified by mode.
This method checks that the given mode is valid, retrieve the dataset files using a Provider and then dispatches the actual loading of the data to a load_MODE method.
If this dataset consists of a single file as specified by _single_file, the path used will be the one of this file, otherwise, the folder will be used.
- Parameters
mode (
Optional[str]) – Mode to load the dataset, or None to use the default mode.force_update (
bool) – Force update of the dataset if possible.with_info (
bool) – Returns information about the dataset alongside the actual dataset(s).**kwargs – Extra arguments for the specific mode.
- Return type
Any- Returns
The dataset as specified by mode and the given extra arguments.
- Raises
InvalidModeError – If the given mode is not available for this dataset.
- load_path(path)
Load method for path mode.
- Parameters
path (
Path) – Path of the dataset.- Return type
Path- Returns
The actual path to the dataset.
- exception deel.datasets.dataset.InvalidModeError(dataset, mode)
Bases:
ExceptionException raised when a mode is not available for a given dataset.
- Parameters
dataset (
BaseDataset) – Dataset for which the mode is not available.mode (
str) – Mode not available.
- class deel.datasets.dataset.VolatileDataset(name, version='latest', settings=None)
Bases:
BaseDatasetDataset that are generated on-the-fly.
Creates a new dataset of the given name and version.
- Parameters
name (
str) – Name of the dataset.version (
str) – Version of the dataset.settings (
Optional[Settings]) – The settings to use for this dataset, or None to use thesettings. (default) –
- load(mode=None, with_info=False, **kwargs)
Load this dataset as specified by mode.
This method checks that the given mode is valid and generates the dataset using the given load_MODE method.
- Parameters
mode (
Optional[str]) – Mode to load the dataset, or None to use the default mode.with_info (
bool) – Returns information about the dataset alongside the actual dataset(s).**kwargs – Extra arguments for the specific mode.
- Return type
Any- Returns
The dataset as specified by mode and the given extra arguments.
- Raises
InvalidModeError – If the given mode is not available for this dataset.
- abstract load_basic()
Load method for path mode.
- Parameters
path – Path of the dataset.
- Returns
The actual path to the dataset.
deel.datasets.settings module
- exception deel.datasets.settings.ParseSettingsError
Bases:
ExceptionException raised if an issue occurs while parsing the settings.
- class deel.datasets.settings.Settings(version, provider_list, path, default_provider='')
Bases:
objectThe Settings class is a read-only class that contains settings for the deel.datasets package.
Settings are stored in a YAML format. The default location for the settings file is $HOME/.deel/config.yml. The DEEL_DATASETS_CONF environment variable can be used to specify the default location of the file.
- Parameters
version (
int) – Version of the settings.provider_type – Type of the provider.
provider_options – Options for the provider.
path (
Path) – Local storage path for the datasets.
- get_best_provider(dataset)
Searchs and returns the best settings provider. If the defauit provider is defined in the configuration file, it is returned. If not and if the dataset is not None, searchs and returns the first settings provider which contains this dataset. If not, returns the local settings provider. :type dataset:
str:param dataset: dataset name- Return type
- Returns
The provider to use
- get_provider_list()
- Return type
Dict[str,SettingsProvider]
- property local_storage: Path
The path to the local storage for the datasets.
- Type
Returns
- Return type
Path
- class deel.datasets.settings.SettingsProvider(provider_type, provider_options)
Bases:
object- Parameters
provider_type (
str) – Type of the provider.provider_options (
Dict[str,Any]) – Options for the provider.
- deel.datasets.settings.get_default_settings(default_provider='')
Retrieve the default settings for the current machine.
- Parameters
default_provider (
str) – optional the default provider to use- Return type
- Returns
The default settings for the current machine.
- deel.datasets.settings.get_settings_for_local()
Retrieve the local default settings.
- Return type
- Returns
The settings for local.
- deel.datasets.settings.read_one_provider(data, version)
Load Settings from the given dictionnary (YAML stream).
- Parameters
data (
Dict[str,Any]) – YAML file settings element dictionnary.- Return type
- Returns
A Settings object constructed from the given data.
- Raises
yaml.YAMLError – If the given stream does not contain valid YAML.
ParseSettingsError – If the given YAML is not valid for settings.
- deel.datasets.settings.read_settings(stream, default_provider='')
Load Settings from the given YAML stream.
- Parameters
stream (
TextIO) – File-like object containing the configuration.default_provider (
str) – default provider to use
- Return type
- Returns
A Settings object constructed from the given YAML stream.
- Raises
yaml.YAMLError – If the given stream does not contain valid YAML.
ParseSettingsError – If the given YAML is not valid for settings.
- deel.datasets.settings.write_settings(settings, stream, **kwargs)
Write the given Settings to the given stream as YAML.
- Parameters
settings (
Settings) – Settings to write.stream (
TextIO) – File-like object where the configuration will be written.**kwargs – Extra arguments for the yaml.safe_dump method.