Welcome to DEEL Dataset Manager’s documentation!
This project aims to ease the installation and usage of self-hosted and proprietary datasets in artificial intelligence projects.
Installation
You can install the manager directly from pypi:
# Note: This currently does not work, see the README.
pip install deel-datasets
Configuration
The configuration file specifies how the datasets should be downloaded, or if the datasets do no have to be downloaded (e.g. on Google Cloud).
The configuration file should be at $HOME/.deel/config.yml:
On Windows system it is
C:\Users\$USERNAME\.deel\config.ymlunless you have set the HOME environment variable.The
DEEL_CONFIGURATION_FILEenvironment variable can be used to specify the location of the configuration file if you do not want to use the default one.
The configuration file is a YAML file, see Configuration for more details.
DEEL dataset plugin
Without plugins, the manager is only able to download a dataset and returns the path to the local folder containing it (after download). By installing plugins, you gain access to automatic way of loading datasets or pre-processing data.
Plugins are Python packages with proper entry points. See Plugins for more information on how to create plugins.
Basic usage
To load a dataset, you can simply do:
import deel.datasets
# Load the default mode of dataset-a dataset:
dataset = deel.datasets.load("dataset-a")
# Load the tensorflow version of the dataset-b dataset (default mode for dataset-b):
dataset = deel.datasets.load("dataset-b")
# Load the pytorch version of the dataset-b dataset:
dataset = deel.datasets.load("dataset-b", mode="pytorch")
The deel.datasets.load() function is the basic entry to access the datasets.
By passing with_info=True, extra information can be retrieved as a python
dictionary. Information are not standardized, so each dataset may provide
different ones:
The mode argument can be used to load different “version” of the dataset. By default,
only the path mode is available and will return the path to the local folder
containing the dataset.
By installing plugins, new modes can be made available for each datasets (see plugin
implementation below).
import deel.datasets
# Load the tensorflow version of the dataset-b dataset:
dataset, info = deel.datasets.load("dataset-b", mode="tensorflow", with_info=True)
print(info["classes"])
The function can take extra parameters depending on the chosen dataset and mode,
for instance, you can specify the percentage of training data for the dataset-b
dataset:
import deel.datasets
# Load the tensorflow version of the dataset-b dataset:
dataset = deel.datasets.load("dataset-b", mode="tensorflow", percent_train=60)
Uninstalling
To uninstall the DEEL dataset manager package , simply run pip uninstall:
pip uninstall deel-datasets
Contents:
- Configuration
- Plugins
- Command Line
- deel.datasets package
- Subpackages
- deel.datasets.providers package
- Submodules
- deel.datasets.providers.exceptions module
- deel.datasets.providers.ftp_providers module
- deel.datasets.providers.gcloud_provider module
- deel.datasets.providers.http_providers module
- deel.datasets.providers.local_as_provider module
- deel.datasets.providers.local_provider module
- deel.datasets.providers.provider module
- deel.datasets.providers.remote_provider module
- deel.datasets.providers.webdav_provider module
- deel.datasets.utils package
- deel.datasets.providers package
- Submodules
- deel.datasets.dataset module
- deel.datasets.settings module
- Subpackages