The dregs CLI

The DESC data registry also comes with a Command Line Interface (CLI) tool, dregs, which can perform some simple actions.

See the tutorials section for a demonstration of its usage.

dregs

The data registry CLI interface

usage: dregs [-h] {ls,register,delete} ...

options

-h, --help

show this help message and exit

dregs delete

usage: dregs delete [-h] {dataset} ...
-h, --help

show this help message and exit

dregs delete dataset

usage: dregs delete dataset [-h] [--config_file CONFIG_FILE]
                            [--root_dir ROOT_DIR] [--site SITE]
                            [--schema SCHEMA]
                            dataset_id
dataset_id

The dataset_id you wish to delete

-h, --help

show this help message and exit

--config_file <config_file>

Location of data registry config file

--root_dir <root_dir>

Location of the root_dir

--site <site>

Get the root_dir through a pre-defined ‘site’

--schema <schema>

Which schema to connect to

dregs ls

usage: dregs ls [-h] [--owner OWNER] [--owner_type {user,group,production}]
                [--all] [--config_file CONFIG_FILE] [--root_dir ROOT_DIR]
                [--site SITE] [--schema SCHEMA]
-h, --help

show this help message and exit

--owner <owner>

List datasets for a given owner

--owner_type {user,group,production}

List datasets for a given owner type

--all

List all datasets

--config_file <config_file>

Location of data registry config file

--root_dir <root_dir>

Location of the root_dir

--site <site>

Get the root_dir through a pre-defined ‘site’

--schema <schema>

Which schema to connect to

dregs register

usage: dregs register [-h] {dataset} ...
-h, --help

show this help message and exit

dregs register dataset

usage: dregs register dataset [-h] [--name NAME]
                              [--version_suffix VERSION_SUFFIX]
                              [--creation_date CREATION_DATE]
                              [--access_API ACCESS_API] [--owner OWNER]
                              [--owner_type {user,group,project,production}]
                              [--description DESCRIPTION]
                              [--execution_id EXECUTION_ID]
                              [--is_overwritable]
                              [--location_type {dataregistry,external,dummy}]
                              [--url URL] [--contact_email CONTACT_EMAIL]
                              [--old_location OLD_LOCATION] [--make_symlink]
                              [--execution_name EXECUTION_NAME]
                              [--execution_description EXECUTION_DESCRIPTION]
                              [--execution_start EXECUTION_START]
                              [--execution_site EXECUTION_SITE]
                              [--execution_configuration EXECUTION_CONFIGURATION]
                              [--input_datasets INPUT_DATASETS [INPUT_DATASETS ...]]
                              [--config_file CONFIG_FILE]
                              [--root_dir ROOT_DIR] [--site SITE]
                              [--schema SCHEMA]
                              relative_path version
relative_path

Destination for the dataset within the data registry. Path isrelative to <registry root>/<owner_type>/<owner>.

version

Semantic version string of the format MAJOR.MINOR.PATCH or a specialflag “patch”, “minor” or “major”. When a special flag is used itautomatically bumps the relative version for you (see examples for moredetails).

-h, --help

show this help message and exit

--name <name>

Any convenient, evocative name for the human. Note the combination of name, version and version_suffix must be unique. If None name is generated from the relative path.

--version_suffix <version_suffix>

Optional version suffix to place at the end of the version string. Cannot be used for production datasets.

--creation_date <creation_date>

Dataset creation date

--access_API <access_api>

Describes the software that can read the dataset (e.g., ‘gcr-catalogs’, ‘skyCatalogs’)

--owner <owner>

Owner of the dataset (defaults to $USER)

--owner_type {user,group,project,production}

Datasets owner type, can be ‘user’, ‘group’, ‘project’ or ‘production’. (default=user)

--description <description>

User provided human-readable description of the dataset

--execution_id <execution_id>

Execution this dataset is linked to

--is_overwritable

True means this dataset can be overwritten in the future

--location_type {dataregistry,external,dummy}

What is the physical location of the data? ‘dataregistry’ means the data is located within the <root_dir> and managed by the dataregistry, external means the data is not managed by the dataregistry, either because it is off-site or because it is stored outside <root_dir> therefore there is only a database entry (in this case a url or contact_email must be provided during registration) and ‘dummy’ is a dataset for testing purposes only (only a database entry is created in this case). (default=dataregistry)

--url <url>

URL that points to the data (used in the case of external datasets, i.e., location_type=’external’).

--contact_email <contact_email>

Contact information for someone regarding the dataset.

--old_location <old_location>

Absolute location of dataset to copy. If None dataset should alreadybe at correct relative_path.

Flag to make symlink to data rather than copy any files.

--execution_name <execution_name>

Typically pipeline name or program name

--execution_description <execution_description>

Human readible description of execution

--execution_start <execution_start>

Date the execution started

--execution_site <execution_site>

Where was the execution performed?

--execution_configuration <execution_configuration>

Path to text file used to configure the execution

--input_datasets <input_datasets>

List of dataset ids that were the input to this execution

--config_file <config_file>

Location of data registry config file

--root_dir <root_dir>

Location of the root_dir

--site <site>

Get the root_dir through a pre-defined ‘site’

--schema <schema>

Which schema to connect to