Command Line

The deel-datasets package comes with some command line utilities that can be accessed using:

python -m deel.datasets ARGS...

The --help option can be used to view the full capabilities of the command line program. By default, the program uses the configuration at $HOME/.deel/config.yml (or specified by the environment variable), but the -c argument can be used to specified a custom configuration file.

$ python -m deel.datasets --help
usage: __main__.py [-h] [-c CONFIG] {check,list,download,remove} ...

DEEL dataset manager

positional arguments:
{check,list,download,remove}
                        sub-command help
    check               check config
    list                list datasets
    download            download datasets
    remove              remove local datasets

optional arguments:
-h, --help            show this help message and exit
-c CONFIG, --config CONFIG
                        configuration file to use

Listing datasets

The list command can be used to list available and installed datasets.

$ python -m deel.datasets list --help
usage: __main__.py list [-h] [-l] [prov_conf]

positional arguments:
prov_conf    provider in configuration to use

optional arguments:
-h, --help   show this help message and exit
-l, --local  for a non-local provider (e.g., WebDAV), list only local datasets

If the configuration specify remote providers (e.g., WebDAV), this will list the datasets available remotely (from all providers). To list the dataset already downloaded, you can use the --local option.

$ python -m deel.datasets list
Listing datasets at https://datasets.company.com:
dataset-a: 3.0.1 [latest], 3.0.0
dataset-b: 1.0 [latest]
dataset-c: 1.0 [latest]
$ python -m deel.datasets list --local
Listing datasets at /opt/datasets:
dataset-a: 3.0.1 [latest], 3.0.0
dataset-c: 1.0 [latest]

Downloading datasets

Datasets are automatically downloaded when required, but you can download them manually using the download command.

$ python -m deel.datasets download --help
usage: __main__.py download [-h] [-p [PROV_CONF]] [-f] datasets [datasets ...]

positional arguments:
datasets              datasets to download, format name:version with :version being optional

optional arguments:
-h, --help            show this help message and exit
-p [PROV_CONF], --provider [PROV_CONF]
                        provider in configuration to use
-f, --force           force download

If the configuration does not specify a remote provider, the command does nothing except displaying some information. The -p argument can be used to specify the provider to download the dataset from in case the dataset is available from multiple providers. The :VERSION can be omitted, in which case :latest is implied. To force the re-download of a dataset, the --force option can be used.

$ python -m deel.datasets download dataset-a:3.0.0
Fetching dataset-a:3.0.0...
dataset-a-3.0.0-20191004.zip: 100%|██████████████████████| 122M/122M [00:03<00:00, 39.3Mbytes/s]
Dataset dataset-a:3.0.0 stored at '/opt/datasets/dataset-a/3.0.0'.

Removing datasets

The remove command can be used to delete local datasets.

$ python -m deel.datasets remove --help
usage: __main__.py remove [-h] [-a ALL] [datasets [datasets ...]]

positional arguments:
datasets           datasets to remove, format name:version, [...]

optional arguments:
-h, --help         show this help message and exit
-a ALL, --all ALL  remove all local datasets

If :VERSION is omitted, the whole dataset corresponding to NAME is deleted (all the versions). If the --all option is used, all datasets are removed from the local storage.