deel.datasets.providers package

deel.datasets.providers.make_provider(provider_type, root_path, provider_options={})

Create a new provider using the given arguments.

Parameters
  • provider_type (str) – Type of the provider.

  • root_path (Path) – Local path for the datasets.

  • provider_options (Dict[str, Any]) – Extra options to pass to the provider

  • constructor.

Return type

Provider

Returns

A provider corresponding to the given arguments.

Raises
  • ValueError – If the given provider_type is invalid or if the

  • given options do not match the given provider.

Submodules

deel.datasets.providers.exceptions module

exception deel.datasets.providers.exceptions.DatasetNotFoundError(name)

Bases: Exception

Exception thrown by providers when the requested dataset is not found.

Parameters

name (str) – Name of the dataset not found.

exception deel.datasets.providers.exceptions.DatasetVersionNotFoundError(name, version)

Bases: DatasetNotFoundError, VersionNotFoundError

Exception thrown by providers when the requested dataset version is not found.

This exception is meant to be more specific than DatasetNotFoundError.

Parameters
  • name (str) – Name of the dataset not found.

  • version (str) – Version of the dataset not found.

exception deel.datasets.providers.exceptions.InvalidConfigurationError

Bases: Exception

Exception raised if the provider configuration is invalid.

exception deel.datasets.providers.exceptions.ProviderNotAvailableError

Bases: Exception

Exception raised if the provider is not available.

exception deel.datasets.providers.exceptions.VersionNotFoundError(version=None)

Bases: Exception

Exception thrown by providers when the requested version is not found.

Parameters

version (Optional[str]) – Version of the dataset not found (or a version selector).

deel.datasets.providers.ftp_providers module

class deel.datasets.providers.ftp_providers.FtpProvider(root_folder, remote_url, authenticator=None, **kwargs)

Bases: RemoteProvider

The FtpProvider is a RemoteProvider associated to a FTP server.

Parameters
  • root_folder (PathLike) – Root folder to look-up datasets.

  • remote_url (str) – Remote URL of the Ftp server.

  • authenticator (Optional[FtpSimpleAuthenticator]) – Authenticator to use.

  • **kwargs – Extra arguments for the FTP constructor.

list_datasets()

List the available datasets for this provider.

Return type

List[str]

Returns

The list of datasets available for this provider.

list_versions(dataset)

List the available versions of the given dataset for this provider.

Return type

List[str]

Returns

The list of available versions of the given dataset for this provider.

Raises

DatasetNotFoundError – If the given dataset does not exist.

class deel.datasets.providers.ftp_providers.FtpRemoteFile(client, remote_path, local_path)

Bases: RemoteFile

Class representing a remote file for the FTP provider.

Parameters
  • client (FTP) – The FTP client (used for download).

  • remote_path (Path) – Remote path to the dataset file, relative the root of the FTP server..

  • local_path (Path) – Local path to the file of the dataset, relative to the dataset folder.

download(local_file)

Download this file from the remote storage to the local path.

Parameters

local_file (Path) – Local path where the file should be downloaded.

property relative_path: Path

Returns: The path of this file relative to the dataset and version it belongs.

Return type

Path

class deel.datasets.providers.ftp_providers.FtpSimpleAuthenticator(username, password)

Bases: object

Authenticator for a simple FTP authentication with a username and a password.

Parameters
  • username (str) – Username to use for authentication.

  • password (str) – Password to use for authentication.

property password

The password to use for authentication.

Type

Returns

property username

The username to use for authentication.

Type

Returns

class deel.datasets.providers.ftp_providers.FtpSingleFileProvider(root_folder, remote_url, name, version='1.0.0', authenticator=None, **kwargs)

Bases: RemoteSingleFileProvider, FtpProvider

The FtpProvider is a RemoteProvider associated to a FTP server.

This provider currently does not supported encrypted connection.

Parameters
  • root_folder (PathLike) – Root folder to look-up datasets.

  • remote_url (str) – Remote URL of the file to serve.

  • name (str) – Name of the dataset corresponding to the remote file.

  • version (str) – Version of the dataset corresponding to the remote file.

  • authenticator (Optional[FtpSimpleAuthenticator]) – Authenticator to use.

  • **kwargs – Extra arguments for the FTP constructor.

deel.datasets.providers.gcloud_provider module

The GCloudProvider is a simple alias for LocalProvider.

class deel.datasets.providers.gcloud_provider.GCloudProvider(disk)

Bases: LocalProvider

The GCloudProvider is a simple alias for LocalProvider.

Parameters

root_folder – Root folder to look-up datasets.

deel.datasets.providers.http_providers module

class deel.datasets.providers.http_providers.HttpMultiFilesProvider(root_folder, remote_url_list, name, version='1.0.0', authenticator=None)

Bases: RemoteProvider

This provider is a RemoteProvider that can serve a list of files over the HTTP protocol.

Parameters
  • root_folder (PathLike) – Root folder to look-up datasets.

  • remote_url – Remote URL of the file to serve.

  • name (str) – Name of the dataset corresponding to the remote file.

  • version (str) – Version of the dataset corresponding to the remote file.

  • authenticator (Optional[HttpSimpleAuthenticator]) – Authenticator to use.

list_datasets()

List the available datasets for this provider.

Return type

List[str]

Returns

The list of datasets available for this provider.

list_versions(dataset)

List the available versions of the given dataset for this provider.

Return type

List[str]

Returns

The list of available versions of the given dataset for this provider.

Raises

DatasetNotFoundError – If the given dataset does not exist.

class deel.datasets.providers.http_providers.HttpRemoteFile(remote_url, relative_path)

Bases: RemoteFile

Class representing a remote file for the WebDAV provider.

Parameters
  • remote_url (str) – Remote URL of the file..

  • relative_path (Path) – Relative path to the file from the dataset folder.

download(local_file)

Download this file from the remote storage to the local path.

Parameters

local_file (Path) – Local path where the file should be downloaded.

property relative_path: Path

Returns: The path of this file relative to the dataset and version it belongs.

Return type

Path

class deel.datasets.providers.http_providers.HttpSimpleAuthenticator(username, password)

Bases: object

Authenticator for a simple HTTP authentication with a username and a password.

Parameters
  • username (str) – Username to use for authentication.

  • password (str) – Password to use for authentication.

property password

The password to use for authentication.

Type

Returns

property username

The username to use for authentication.

Type

Returns

class deel.datasets.providers.http_providers.HttpSingleFileProvider(root_folder, remote_url, name, version='1.0.0', authenticator=None)

Bases: HttpMultiFilesProvider

This provider is a RemoteProvider that can only serve a single file over the HTTP protocol.

Parameters
  • root_folder (PathLike) – Root folder to look-up datasets.

  • remote_url (str) – Remote URL of the file to serve.

  • name (str) – Name of the dataset corresponding to the remote file.

  • version (str) – Version of the dataset corresponding to the remote file.

  • authenticator (Optional[HttpSimpleAuthenticator]) – Authenticator to use.

deel.datasets.providers.local_as_provider module

class deel.datasets.providers.local_as_provider.LocalAsProvider(root_folder, source_folder)

Bases: RemoteProvider

The LocalAsProvider is a Provider associated to a local source of datasets.

Parameters
  • root_folder (PathLike) – Root folder to look-up datasets.

  • source_folder (PathLike) – local source directory of datasets.

list_datasets()

List the available datasets for this provider.

Return type

List[str]

Returns

The list of datasets available for this provider.

list_versions(dataset)

List the available versions of the given dataset for this provider.

Return type

List[str]

Returns

The list of available versions of the given dataset for this provider.

Raises

DatasetNotFoundError – If the given dataset does not exist.

class deel.datasets.providers.local_as_provider.LocalFile(dataset_path, source_path)

Bases: RemoteFile

Representing a local file as local provider file.

download(local_file)

Copy this file from the local provider directory to the local path.

Parameters

local_file (Path) – Local path where the file should be copied.

property relative_path: Path

Returns: The path of this file relative to the local provider directory and version it belongs.

Return type

Path

property size: int

Returns: The size of the file, in bytes.

Return type

int

property source_path: Path

Returns: The full path to the source file.

Return type

Path

deel.datasets.providers.local_provider module

class deel.datasets.providers.local_provider.LocalProvider(root_folder)

Bases: Provider

A LocalProvider is a provider that look-up datasets in a local location (a folder).

Parameters

root_folder (PathLike) – Root folder to look-up datasets.

del_folder(name, version, keep_dataset=False)

Delete the folder corresponding to the given dataset version. If after deleting this dataset, there are no versions remaining, the dataset folder is also removed, unless keep_dataset is True.

Parameters
  • name (str) – Name of the dataset to delete.

  • version (str) – Version of the dataset to delete.

  • keep_dataset (bool) – True to not remove the dataset folder

  • versions. (when there are no remaining) –

get_folder(name, version='latest', force_update=False, returns_version=False)

Retrieve the root folder for the given dataset.

Parameters
  • name (str) – Name of the dataset to retrieve the folder for.

  • version (str) – Version of the dataset to retrieve the folder for. Can be an exact version like “3.1.2”, or “latest”, or a wildcard, e.g., “3.1.*”.

  • force_update (bool) – Force the update of the local dataset if possible. May have no effect on some providers.

  • returns_version (bool) – If True, the exact version of the dataset will be returned along the path.

Return type

Union[Path, Tuple[Path, str]]

Returns

A path to the root folder for the given dataset name, or a tuple containing the path and the exact version.

Raises
list_datasets()

List the available datasets for this provider.

Return type

List[str]

Returns

The list of datasets available for this provider.

list_versions(dataset)

List the available versions of the given dataset for this provider.

Return type

List[str]

Returns

The list of available versions of the given dataset for this provider.

Raises

DatasetNotFoundError – If the given dataset does not exist.

property root_folder: Path

Returns: The local path to root folder for the datasets.

Return type

Path

deel.datasets.providers.provider module

class deel.datasets.providers.provider.Provider

Bases: ABC

The Provider class is an abstract interface for classes that provides access to dataset storages.

The list of methods that should be overriden by all child classes are indicated with the abc.abstractmethod decorator. If a class requires specific clean-up, the __enter__ and __exit__ special functions can be overriden.

abstract get_folder(name, version='latest', force_update=False, returns_version=False)

Retrieve the root folder for the given dataset.

Parameters
  • name (str) – Name of the dataset to retrieve the folder for.

  • version (str) – Version of the dataset to retrieve the folder for. Can be an exact version like “3.1.2”, or “latest”, or a wildcard, e.g., “3.1.*”.

  • force_update (bool) – Force the update of the local dataset if possible. May have no effect on some providers.

  • returns_version (bool) – If True, the exact version of the dataset will be returned along the path.

Return type

Union[Path, Tuple[Path, str]]

Returns

A path to the root folder for the given dataset name, or a tuple containing the path and the exact version.

Raises
get_version(version, versions)

Retrieve the version from the list of versions that best match the given one.

Parameters
  • version (str) – Version to retrieve. Can be an exact version, e.g., “3.1.2”,

  • "3.1.*" (or a wildcard) –

  • "latest". (or) –

  • versions (List[str]) – List of versions to retrieve the version from. Versions should

  • x.y.z. (all be of the form) –

Return type

str

Returns

The version in versions that best matches version.

Raises
abstract list_datasets()

List the available datasets for this provider.

Return type

List[str]

Returns

The list of datasets available for this provider.

abstract list_versions(dataset)

List the available versions of the given dataset for this provider.

Return type

List[str]

Returns

The list of available versions of the given dataset for this provider.

Raises

DatasetNotFoundError – If the given dataset does not exist.

deel.datasets.providers.remote_provider module

class deel.datasets.providers.remote_provider.FileModifier

Bases: ABC

Abstract class representing a modifier to apply to the file downloaded by the WebDAV provider.

accept(file)

Check if the given file can be modified by this modifier.

Parameters

file (Path) – The file to check.

Returns: True if the file can be modified, False otherwize.

Return type

bool

apply(file)

Apply this modifier to the given file.

Parameters

file (Path) – The file to apply the modifier to.

Raises

FileNotFoundError – If the file does not exists.

class deel.datasets.providers.remote_provider.GzExtractor

Bases: FileModifier

Modifier that extract files from gz archives and delete them afterwards.

accept(file)

Check if the given file can be modified by this modifier.

Parameters

file (Path) – The file to check.

Returns: True if the file can be modified, False otherwize.

Return type

bool

apply(file)

Apply this modifier to the given file.

Parameters

file (Path) – The file to apply the modifier to.

Raises

FileNotFoundError – If the file does not exists.

class deel.datasets.providers.remote_provider.RemoteFile

Bases: object

Abstraction representing a remote file.

abstract download(local_file)

Download this file from the remote storage to the local path.

Parameters

local_file (Path) – Local path where the file should be downloaded.

abstract property relative_path: Path

Returns: The path of this file relative to the dataset and version it belongs.

Return type

Path

class deel.datasets.providers.remote_provider.RemoteProvider(root_folder, remote_url)

Bases: LocalProvider

The RemoteProvider extends LocalProvider by fetching datasets from a remote server if they are not found on the local storage.

If a dataset is not found locally (or a force download is required), the provider will first downloads all the files corresponding to the given dataset, and then extract all archived files (.zip, .gz, .tgz) in the local folder.

Parameters
  • root_folder (PathLike) – Root folder to look-up datasets.

  • remote_url (str) – Remote URL of the WebDAV server.

get_folder(name, version='latest', force_update=False, returns_version=False)

Retrieve the root folder for the given dataset.

Parameters
  • name (str) – Name of the dataset to retrieve the folder for.

  • version (str) – Version of the dataset to retrieve the folder for. Can be an exact version like “3.1.2”, or “latest”, or a wildcard, e.g., “3.1.*”.

  • force_update (bool) – Force the update of the local dataset if possible. May have no effect on some providers.

  • returns_version (bool) – If True, the exact version of the dataset will be returned along the path.

Return type

Union[Path, Tuple[Path, str]]

Returns

A path to the root folder for the given dataset name, or a tuple containing the path and the exact version.

Raises
local_provider()

Create and returns a LocalProvider corresponding to the local storage for this provider.

Return type

LocalProvider

Returns

A LocalProvider that fetches datasets from the local folder this provider stores the datasets to.

modifiers: List[FileModifier] = [<deel.datasets.providers.remote_provider.ZipExtractor object>, <deel.datasets.providers.remote_provider.TarZExtractor object>, <deel.datasets.providers.remote_provider.GzExtractor object>]
property remote_url: str

The remote URL from where the datasets are fetched.

Type

Returns

Return type

str

class deel.datasets.providers.remote_provider.RemoteSingleFileProvider(root_folder, remote_url, name, version='1.0.0')

Bases: RemoteProvider

The RemoteSingleFileProvider extends RemoteProvider and should be used to fetch files from custom web servers (HTTP, FTP) that only provide a single file. The only methods that should be implemented are _is_available and _list_remote_files.`

The goal of this class is mainly to be used to allow the creation of datasets from publicly available files.

Parameters
  • root_folder (PathLike) – Root folder to look-up datasets.

  • remote_url (str) – Remote URL of the WebDAV server.

  • name (str) – Name of the dataset corresponding to the remote file.

  • version (str) – Version of the dataset corresponding to the remote file.

list_datasets()

List the available datasets for this provider.

Return type

List[str]

Returns

The list of datasets available for this provider.

list_versions(dataset)

List the available versions of the given dataset for this provider.

Return type

List[str]

Returns

The list of available versions of the given dataset for this provider.

Raises

DatasetNotFoundError – If the given dataset does not exist.

class deel.datasets.providers.remote_provider.TarZExtractor

Bases: FileModifier

Modifier that extract files from tar archives, with or without comrpession and delete them afterwards.

See the tarfile library for the list of supported compression methods.

accept(file)

Check if the given file can be modified by this modifier.

Parameters

file (Path) – The file to check.

Returns: True if the file can be modified, False otherwize.

Return type

bool

apply(file)

Apply this modifier to the given file.

Parameters

file (Path) – The file to apply the modifier to.

Raises

FileNotFoundError – If the file does not exists.

class deel.datasets.providers.remote_provider.ZipExtractor

Bases: FileModifier

Modifier that unzip files and delete them afterwards.

accept(file)

Check if the given file can be modified by this modifier.

Parameters

file (Path) – The file to check.

Returns: True if the file can be modified, False otherwize.

Return type

bool

apply(file)

Apply this modifier to the given file.

Parameters

file (Path) – The file to apply the modifier to.

Raises

FileNotFoundError – If the file does not exists.

deel.datasets.providers.webdav_provider module

class deel.datasets.providers.webdav_provider.WebDavAuthenticator

Bases: ABC

Base class for WebDAV authenticators.

Authenticator classes are used for dispatching and storing eventual parameters.

class deel.datasets.providers.webdav_provider.WebDavProvider(root_folder, remote_url, remote_path='', authenticator=None)

Bases: RemoteProvider

The WebDavProvider is a RemoteProvider associated to a WebDAV server.

Parameters
  • root_folder (PathLike) – Root folder to look-up datasets.

  • remote_url (str) – Remote URL of the WebDAV server.

  • authenticator (Optional[WebDavAuthenticator]) – Authenticator to use.

VERSION_REGEX: Pattern = re.compile('[0-9]+[.][0-9]+[.][0-9]+')
list_datasets()

List the available datasets for this provider.

Return type

List[str]

Returns

The list of datasets available for this provider.

list_versions(dataset)

List the available versions of the given dataset for this provider.

Return type

List[str]

Returns

The list of available versions of the given dataset for this provider.

Raises

DatasetNotFoundError – If the given dataset does not exist.

class deel.datasets.providers.webdav_provider.WebDavRemoteFile(client, dataset_path, file_path)

Bases: RemoteFile

Class representing a remote file for the WebDAV provider.

Parameters
  • client (Client) – The WebDAV client (used for download).

  • dataset_path (str) – Path to the dataset, relative the root of the server.

  • file_path (str) – Path to the file of the dataset, relative to the dataset path.

download(local_file)

Download this file from the remote storage to the local path.

Parameters

local_file (Path) – Local path where the file should be downloaded.

property relative_path: Path

Returns: The path of this file relative to the dataset and version it belongs.

Return type

Path

class deel.datasets.providers.webdav_provider.WebDavSimpleAuthenticator(username, password)

Bases: WebDavAuthenticator

Authenticator for a simple HTTP authentication with a username and a password.

Parameters
  • username (str) – Username to use for authentication.

  • password (str) – Password to use for authentication.

property password

The password to use for authentication.

Type

Returns

property username

The username to use for authentication.

Type

Returns