Running a pipeline: =================== TXPipe is based on ceci on software. To run the pipeline it needs 2 files a *configuration file* and a *Pipeline file* Examples of both can be found in the `examples folder `_. To running the actual pipeline will be as simple as:: ceci examples/laptop_pipeline.yml So what goes into this sort of file? We will use *laptop_pipeline.yml* as an example and go through it. For details on Ceci please see `Ceci Documentation `_. The pipeline file: ------------------ First part of the file is the stages:: stages: - name: TXSourceSelector # select and split objects into source bins - name: TXTruthLensSelector # select objects for lens bins - name: PZPDFMLZ # compute p(z) per galaxy using MLZ - name: TXPhotozStack # stack p(z) into n(z) - name: TXMainMaps # make source g1, g2 and lens n_gal maps - name: TXAuxiliaryMaps # make PSF, depth, flag, and other maps - name: TXSimpleMask # combine maps to make a simple mask - name: TXDensityMaps # turn mask and ngal maps into overdensity maps - name: TXMapPlots # make pictures of all the maps - name: TXTracerMetadata # collate metadata - name: TXRandomCat # generate lens bin random catalogs - name: TXJackknifeCenters # Split the area into jackknife regions - name: TXTwoPoint # Compute real-space 2-point correlations threads_per_process: 2 - name: TXBlinding # Blind the data following Muir et al threads_per_process: 2 - name: TXTwoPointPlots # Make plots of 2pt correlations - name: TXDiagnosticPlots # Make a suite of diagnostic plots - name: TXGammaTFieldCenters # Compute and plot gamma_t around center points threads_per_process: 2 - name: TXGammaTBrightStars # Compute and plot gamma_t around bright stars threads_per_process: 2 - name: TXGammaTRandoms # Compute and plot gamma_t around randoms threads_per_process: 2 - name: TXGammaTDimStars # Compute and plot gamma_t around dim stars threads_per_process: 2 - name: TXRoweStatistics # Compute and plot Rowe statistics threads_per_process: 2 - name: TXGalaxyStarDensity - name: TXGalaxyStarShear - name: TXPSFDiagnostics # Compute and plots other PSF diagnostics - name: TXBrighterFatterPlot # Make plots tracking the brighter-fatter effect - name: TXPhotozPlots # Plot the bin n(z) - name: TXConvergenceMaps # Make convergence kappa maps from g1, g2 maps - name: TXConvergenceMapPlots # Plot the convergence map - name: TXMapCorrelations # plot the correlations between systematics and data Each line indicates a pipeline stage that needs to be run. Each stage of course points to one of the stages implemented in :doc:`TXPipe `. Note a few lines have have specifically ``threads_per_process: 2`` which is just a way to indicate that these stages should be run with more threads. Another option available is ``nprocess: 32`` which would run the stage on 32 processes. Next follows modules, which simply is which modules and packages the pipeline stages are defined in.:: modules: txpipe python_paths: - submodules/WLMassMap/python/desc/ - submodules/TJPCov - submodules/FlexZPipe The ``python_paths`` is modules we need that are not in the TXPipe repo and where to find them. Then comes:: output_dir: data/example/outputs Which is where all outputs from the stages will be saved. Launcher, determines how ceci schedules the stages: currently there are 3 options available: * mini * parsl * cwl See Ceci's documentation on `Launchers `_. In the example we use:: launcher: name: mini interval: 1.0 Interval is how often ceci checks if stages have completed. Next follows site, again this is a ceci configuration `details `_:: site: name: local max_threads: 2 It tells us where the code is to be run, and the ``max_threads`` that the code will be run on. Then we have ``config`` which points to the configuration file mentioned as the other needed file.:: config: examples/config/laptop_config.yml Then we have the *inputs*:: inputs: # See README for paths to download these files shear_catalog: data/example/inputs/shear_catalog.hdf5 photometry_catalog: data/example/inputs/photometry_catalog.hdf5 photoz_trained_model: data/example/inputs/cosmoDC2_trees_i25.3.npy calibration_table: data/example/inputs/sample_cosmodc2_w10year_errors.dat exposures: data/example/inputs/exposures.hdf5 star_catalog: data/example/inputs/star_catalog.hdf5 # This file comes with the code fiducial_cosmology: data/fiducial_cosmology.yml This is the location of all the inputs that the pipeline will be run on. In the code the inputs will be refered to by the names i.e. ``shear_catalog`` etc. but here is where which shear catalog it is is specified. Finally a few more ceci details:: resume: True log_dir: data/example/logs pipeline_log: data/example/log.txt The first here is simply if possible should a restart of the pipeline resume from where it ended or start over. Secondly for each stage there will be a log file detailing what has been done, where is this saved. While ``pipeline_log`` is where the overall parsl pipeline log is saved. Config file: ------------ Let us take a look at the how the *configuration file* will look like. First we have ``global`` which is configuration options that are shared across all stages:: global: # This is read by many stages that read complete # catalog data, and tells them how many rows to read # at once chunk_rows: 100000 # These mapping options are also read by a range of stages pixelization: healpix nside: 512 sparse: True # Generate sparse maps - faster if using small areas Next follows the options for each stages. Options listed here will overwrite the options given at the beginning of the corresponding stage. As an example we can look at ``TXTwoPoint``:: TXTwoPoint: binslop: 0.1 delta_gamma: 0.02 do_pos_pos: True do_shear_shear: True do_shear_pos: True flip_g2: True # use true when using metacal shears min_sep: 2.5 max_sep: 60.0 nbins: 10 verbose: 0 subtract_mean_shear: True Each line here overwrite the standard configuration given for the ``TXTwoPoint`` stage :doc:`TXTwoPoint `:: config_options = { 'calcs':[0,1,2], 'min_sep':0.5, 'max_sep':300., 'nbins':9, 'bin_slop':0.1, 'sep_units':'arcmin', 'flip_g2':True, 'cores_per_task':20, 'verbose':1, 'source_bins':[-1], 'lens_bins':[-1], 'reduce_randoms_size':1.0, 'do_shear_shear': True, 'do_shear_pos': True, 'do_pos_pos': True, 'var_methods': 'jackknife', 'use_true_shear': False, 'subtract_mean_shear':False .. note:: we don't need to replace all options, the options we don't replace will just use the options from the file.