Accessing the Coadd data using the Data Butler#


Owner: Jim Chiang (@jchiang87)
Last Verified to Run: 2024-03-01 (by @fjaviersanchez)

This notebook shows how to read in the coadd catalog data for an entire tract using the LSST Data Management (DM) data butler, and construct a Pandas dataframe with the columns needed to apply the weak lensing cuts used in recent analysis of data from Hyper Suprime Cam Mandelbaum et al (2018) using cuts given by Francois in Slack.

We have a notebook that performs a similar analysis using GCR. See also the more general script for extracting the coadd data from a tract into an hdf5 file.

Learning Objectives#

After completing and studying this Notebook, you should be able to

  1. Read in the coadd catalog data using the DM Butler

  2. Construct a Pandas data frame

  3. Apply the HSC weak lensing cuts to a catalog of DM-processed data.

Set Up#

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

import lsst.daf.persistence as dp

from desc_dc2_dm_data import REPOS

How To Read DM Science Pipelines Coadd Catalogs into a Pandas Dataframe#

Here’s a function to do this: pass it a butler and some information about what you want, and get a pandas df object back. In this example, we’ll get the forced photometry as well as the model measurements - these are stored in different tables.

REPOS
tract = skymap[4851]
filter = 'i'
for patch in tract:
        patchId = '%d,%d' % patch.getIndex()
        dataId = dict(filter=filter_, tract=tract.getId(), patch=patchId)

        # Patches for which there is no data will raise a `NoResults` exception when requested
        # from the butler.  We catch those exceptions, note the missing data, and 
        # continue looping over the patches.
        try:
            forced = butler.get('deepCoadd_forced_src', dataId)
            calib = butler.get('deepCoadd_calexp_photoCalib', dataId)
            merged = butler.get('deepCoadd_ref', dataId)
        except dp.NoResults as eobj:
            print(eobj)
            continue
def read_tract(butler, tract, filter_='i', num_patches=None):
    """
    Read in the coadd catalog data from the specified tract, looping over patches and concatenating
    into a pandas data frame.  
    
    Notes
    -----
    The `merged` coadd catalog is supplemented with the CModel fluxes from the `forced_src` 
    catalog for the desired filter.

    Parameters
    ----------
    butler: lsst.daf.persistence.Butler
        Data butler that points to the data repository of interest, e.g., the output repository
        for the Run1.1p data.
    tract: lsst.skymap.tractInfo.TractInfo
        The object containing the tract information as obtained from a skyMap object, e.g., via 
        `tract = skymap[tractId]` where `tractId` is the tract number.
    filter_: str ['i']
        Filter to use for extracting the `forced_src` fluxes.
    num_patches: int [None]
        Number of patches to consider.  If None, then use all of the available patches in the tract.

    Returns
    -------
    pandas.DataFrame: This will contain the merged coadd data with the band-specific columns.
    """
    if num_patches is None:
        num_patches = len(tract)
    
    df_list = []
    i = 0
    print("Loop  Patch  Nobj")
    print("=================")
    for patch in tract:
        patchId = '%d,%d' % patch.getIndex()
        dataId = dict(filter=filter_, tract=tract.getId(), patch=patchId)

        # Patches for which there is no data will raise a `NoResults` exception when requested
        # from the butler.  We catch those exceptions, note the missing data, and 
        # continue looping over the patches.
        try:
            forced = butler.get('deepCoadd_forced_src', dataId)
            calib = butler.get('deepCoadd_calexp_photoCalib', dataId)
            merged = butler.get('deepCoadd_ref', dataId)
        except dp.NoResults as eobj:
            print(eobj)
            continue

        # Convert the merged coadd catalog into a pandas DataFrame and add the band-specific
        # quantities from the forced_src coadd catalog as new columns.
        data = merged.asAstropy().to_pandas()
        data[filter_ + '_modelfit_CModel_instFlux'] = forced['modelfit_CModel_instFlux']
        data[filter_ + '_modelfit_CModel_instFluxErr'] = forced['modelfit_CModel_instFluxErr']

        # The calib object applies the zero point to get calibrated magnitudes.
        
        _aux = calib.instFluxToMagnitude(forced, 'modelfit_CModel')
        data[filter_ + '_mag_CModel'] = _aux[:, 0]
        data[filter_ + '_mag_err_CModel'] = _aux[:, 1]
        data[filter_ + '_modelfit_CModel_SNR'] \
            = forced['modelfit_CModel_instFlux']/forced['modelfit_CModel_instFluxErr']
        data['ext_shapeHSM_HsmShapeRegauss_abs_e'] \
            = np.hypot(data['ext_shapeHSM_HsmShapeRegauss_e1'],
                       data['ext_shapeHSM_HsmShapeRegauss_e2'])
        df_list.append(data)

        # Print the current patch and number of objects added to the final DataFrame.
        print("   {}  {}   {}".format(i, patchId, len(data)))
        i += 1
        if i == num_patches:
            break
    return pd.concat(df_list)

Accessing the DM Science Pipeline processed data#

In order to access the DM Science Pipeline Processed outputs, including images and catalog data, we summon a data butler and provide the path to the data repository where the DM Science Pipeline tasks have written their outputs.

butler = dp.Butler(REPOS['2.2i_dr6'])

The skyMap object stores the information defining how we decided to divide up the sky into tracts and patches when we did the coaddition. Here we pick a tract that is near the center of the protoDC2 simulation region.

skymap = butler.get('deepCoadd_skyMap')
tractId = 4851
filter_ = 'i'
num_patches = None
data = read_tract(butler, skymap[tractId], filter_=filter_, num_patches=num_patches)
print("Number of objects in the coadd catalog:", len(data))

Making the Selection#

First off, we need to apply a baseline set of cuts to remove nan’s.

base_mask = ~(np.isnan(data['i_modelfit_CModel_instFlux'])
         | np.isnan(data['ext_shapeHSM_HsmShapeRegauss_resolution'])
         | np.isnan(data['ext_shapeHSM_HsmShapeRegauss_e1']))
data = data[base_mask]
print("Number of objects after baseline cuts:", len(data))

Now we get to the weak lensing cuts. The detect_isPrimary flag identifies the primary object in the overlap regions between patches, i.e., applying it resolves duplicates from the different analyses of overlapping patches.

mask = data['detect_isPrimary']
mask &= data['deblend_skipped'] == False
mask &= data['base_PixelFlags_flag_edge'] == False
mask &= data['base_PixelFlags_flag_interpolatedCenter'] == False
mask &= data['base_PixelFlags_flag_saturatedCenter'] == False
mask &= data['base_PixelFlags_flag_crCenter'] == False
mask &= data['base_PixelFlags_flag_bad'] == False
mask &= data['base_PixelFlags_flag_suspectCenter'] == False
mask &= data['base_PixelFlags_flag_clipped'] == False
mask &= data['ext_shapeHSM_HsmShapeRegauss_flag'] == False

# Cut on measured object properties
mask &= data['i_modelfit_CModel_SNR'] >= 10
mask &= data['ext_shapeHSM_HsmShapeRegauss_resolution'] >= 0.3
mask &= data['ext_shapeHSM_HsmShapeRegauss_abs_e'] < 2
mask &= data['ext_shapeHSM_HsmShapeRegauss_sigma'] <= 0.4
mask &= data['i_mag_CModel'] < 24.5 # !!! Doesnt have exinction correction
mask &= data['base_Blendedness_abs'] < 10**(-0.375)

Let’s make some plots of galaxy properties with the above cuts applied. The figure to compare with is Fig. 22 of Mandelbaum et al (2018).

fig = plt.figure(figsize=(10, 10))
hist_kwds = dict(bins=100, histtype='step', normed=True, linewidth=2.0, color="black")
p1 = fig.add_subplot(2, 2, 1)
plt.hist(data['ext_shapeHSM_HsmShapeRegauss_resolution'][mask], **hist_kwds);
plt.xlabel('{}-band HSM resolution'.format(filter_))
plt.xlim([0.2,1])
p2 = fig.add_subplot(2, 2, 2)
plt.hist(data[filter_ + '_modelfit_CModel_SNR'][mask], range=(0, 100), **hist_kwds);
plt.xlabel('{}-band CModel S/N'.format(filter_))
plt.xlim([5,100])
p3 = fig.add_subplot(2, 2, 3)
plt.hist(data[filter_ + '_mag_CModel'][mask], **hist_kwds);
plt.xlabel(' {}-band CModel mag'.format(filter_))
plt.xlim([20,25])
p4 = fig.add_subplot(2, 2, 4)
plt.hist(data['ext_shapeHSM_HsmShapeRegauss_abs_e'][mask], **hist_kwds);
plt.xlabel('HSM distortion magnitude |e|')
plt.xlim([0,2]);

Here’s the published figure, for comparison: