Running a pipeline:

TXPipe is based on ceci on software. To run the pipeline it needs 2 files a configuration file and a Pipeline file

Examples of both can be found in the examples folder. To running the actual pipeline will be as simple as:

ceci examples/laptop_pipeline.yml

So what goes into this sort of file? We will use laptop_pipeline.yml as an example and go through it. For details on Ceci please see Ceci Documentation.

The pipeline file:

First part of the file is the stages:

stages:
- name: TXSourceSelector     # select and split objects into source bins
- name: TXTruthLensSelector  # select objects for lens bins
- name: PZPDFMLZ             # compute p(z) per galaxy using MLZ
- name: TXPhotozStack        # stack p(z) into n(z)
- name: TXMainMaps           # make source g1, g2 and lens n_gal maps
- name: TXAuxiliaryMaps      # make PSF, depth, flag, and other maps
- name: TXSimpleMask         # combine maps to make a simple mask
- name: TXDensityMaps        # turn mask and ngal maps into overdensity maps
- name: TXMapPlots           # make pictures of all the maps
- name: TXTracerMetadata     # collate metadata
- name: TXRandomCat          # generate lens bin random catalogs
- name: TXJackknifeCenters   # Split the area into jackknife regions
- name: TXTwoPoint           # Compute real-space 2-point correlations
  threads_per_process: 2
- name: TXBlinding           # Blind the data following Muir et al
  threads_per_process: 2
- name: TXTwoPointPlots      # Make plots of 2pt correlations
- name: TXDiagnosticPlots    # Make a suite of diagnostic plots
- name: TXGammaTFieldCenters # Compute and plot gamma_t around center points
  threads_per_process: 2
- name: TXGammaTBrightStars  # Compute and plot gamma_t around bright stars
  threads_per_process: 2
- name: TXGammaTRandoms      # Compute and plot gamma_t around randoms
  threads_per_process: 2
- name: TXGammaTDimStars     # Compute and plot gamma_t around dim stars
  threads_per_process: 2
- name: TXRoweStatistics     # Compute and plot Rowe statistics
  threads_per_process: 2
- name: TXGalaxyStarDensity
- name: TXGalaxyStarShear
- name: TXPSFDiagnostics     # Compute and plots other PSF diagnostics
- name: TXBrighterFatterPlot # Make plots tracking the brighter-fatter effect
- name: TXPhotozPlots        # Plot the bin n(z)
- name: TXConvergenceMaps    # Make convergence kappa maps from g1, g2 maps
- name: TXConvergenceMapPlots # Plot the convergence map
- name: TXMapCorrelations    # plot the correlations between systematics and data

Each line indicates a pipeline stage that needs to be run. Each stage of course points to one of the stages implemented in TXPipe. Note a few lines have have specifically threads_per_process: 2 which is just a way to indicate that these stages should be run with more threads. Another option available is nprocess: 32 which would run the stage on 32 processes.

Next follows modules, which simply is which modules and packages the pipeline stages are defined in.:

modules: txpipe

python_paths:
- submodules/WLMassMap/python/desc/
- submodules/TJPCov
- submodules/FlexZPipe

The python_paths is modules we need that are not in the TXPipe repo and where to find them.

Then comes:

output_dir: data/example/outputs

Which is where all outputs from the stages will be saved.

Launcher, determines how ceci schedules the stages: currently there are 3 options available: * mini * parsl * cwl

See Ceci’s documentation on Launchers. In the example we use:

launcher:
    name: mini
    interval: 1.0

Interval is how often ceci checks if stages have completed.

Next follows site, again this is a ceci configuration details:

site:
    name: local
    max_threads: 2

It tells us where the code is to be run, and the max_threads that the code will be run on.

Then we have config which points to the configuration file mentioned as the other needed file.:

config: examples/config/laptop_config.yml

Then we have the inputs:

inputs:
    # See README for paths to download these files
    shear_catalog: data/example/inputs/shear_catalog.hdf5
    photometry_catalog: data/example/inputs/photometry_catalog.hdf5
    photoz_trained_model: data/example/inputs/cosmoDC2_trees_i25.3.npy
    calibration_table: data/example/inputs/sample_cosmodc2_w10year_errors.dat
    exposures: data/example/inputs/exposures.hdf5
    star_catalog: data/example/inputs/star_catalog.hdf5
    # This file comes with the code
    fiducial_cosmology: data/fiducial_cosmology.yml

This is the location of all the inputs that the pipeline will be run on. In the code the inputs will be refered to by the names i.e. shear_catalog etc. but here is where which shear catalog it is is specified.

Finally a few more ceci details:

resume: True

log_dir: data/example/logs

pipeline_log: data/example/log.txt

The first here is simply if possible should a restart of the pipeline resume from where it ended or start over. Secondly for each stage there will be a log file detailing what has been done, where is this saved. While pipeline_log is where the overall parsl pipeline log is saved.

Config file:

Let us take a look at the how the configuration file will look like. First we have global which is configuration options that are shared across all stages:

global:
  # This is read by many stages that read complete
  # catalog data, and tells them how many rows to read
  # at once
  chunk_rows: 100000
  # These mapping options are also read by a range of stages
  pixelization: healpix
  nside: 512
  sparse: True  # Generate sparse maps - faster if using small areas

Next follows the options for each stages. Options listed here will overwrite the options given at the beginning of the corresponding stage. As an example we can look at TXTwoPoint:

TXTwoPoint:
  binslop: 0.1
  delta_gamma: 0.02
  do_pos_pos: True
  do_shear_shear: True
  do_shear_pos: True
  flip_g2: True  # use true when using metacal shears
  min_sep: 2.5
  max_sep: 60.0
  nbins: 10
  verbose: 0
  subtract_mean_shear: True

Each line here overwrite the standard configuration given for the TXTwoPoint stage TXTwoPoint:

config_options = {
      'calcs':[0,1,2],
      'min_sep':0.5,
      'max_sep':300.,
      'nbins':9,
      'bin_slop':0.1,
      'sep_units':'arcmin',
      'flip_g2':True,
      'cores_per_task':20,
      'verbose':1,
      'source_bins':[-1],
      'lens_bins':[-1],
      'reduce_randoms_size':1.0,
      'do_shear_shear': True,
      'do_shear_pos': True,
      'do_pos_pos': True,
      'var_methods': 'jackknife',
      'use_true_shear': False,
      'subtract_mean_shear':False

Note

we don’t need to replace all options, the options we don’t replace will just use the options from the file.