Database structure
The database schemas
There are two primary database schemas which the majority of users will work with:
The “default” schema, which the a hard-coded variable
DEFAULT_SCHEMA_WORKING
in thesrc/dataregistry/db_basic.py
file. It can be imported byfrom dataregistry.db_basic import DEFAULT_SCHEMA_WORKING
The production schema. This is where production datasets go, and has only read access for the general user. By default this schema is named “production”, however during schema creation (see below) you can specify the name of the production schema (though this should only be changed for testing purposes).
Users can specify their own schemas during the initialization of the
DataRegistry
object (by default DEFAULT_SCHEMA_WORKING
is connected to). If
they wish to connect to the production schema its name will have to be manually
entered (see production schema tutorial). If the user wishes to connect to a
custom schema they will have to manually enter its name, however they will have
to have created their own schema for it to work.
When using SQLite as the backend (useful for testing), the concepts of schemas do not exist.
First time creation of database schemas
In the top level scripts
directory there is a create_registry_schema.py
script to do the initial schema creation. Before using the data registry, both
for Postgres and SQLite backends, this script must have been run.
First, make sure your ~/.config_reg_access
and ~/.pgpass
are correctly
setup (see “Getting set up” for more information on these configuration files).
When creating schemas at NERSC, make sure the SPIN instance of the Postgres
database is running.
The script must be run twice, first for the production schema, then for the
general schema (or run in a single call when using the --create_both
argument). There are three arguments that can be specified (all optional):
--config
: Location of the data registry configuration file (~/.config_reg_access
by default)--schema
: The name of the schema (default isDEFAULT_SCHEMA_WORKING
)--production-schema
: The name of the production schema (default “production”)--create_both
: Create both the production schema and working schema in one call (the production schema will be made first, then the working schema)
The typical initlalization would be:
python3 create_registry_schema.py --create_both