Validation tests on DC2 calexps and src catalogs

Validation tests on DC2 calexps and src catalogs#

Owner: Javier Sánchez Date Last Run: Apr-21-2019

This notebook is intended to document part of the ongoing validation work for the DC2 simulations. For more details please check here. The code here can be adapted for other DC2 runs.

%pylab inline

Temporary fix while GCRCatalogs is updated

import sys
sys.path.insert(0, '/global/homes/j/jsanch87/gcr-catalogs/')
import GCRCatalogs
import GCR
from GCRCatalogs.butler_interface import SingleVisitCatalog
import lsst.daf.persistence
from scipy.stats import binned_statistic
import matplotlib
matplotlib.rcParams.update({'font.size': 14})

We are going to read some calexps and the OpSim database to select interesting visits

data_imsim = '/global/cscratch1/sd/desc/DC2/data/Run1.2i_globus_in2p3_20181217/w_2018_39/rerun/281118'
data_phosim = '/global/cscratch1/sd/desc/DC2/data/Run1.2p_v4/w_2018_39/rerun/calexp-v4'
db_file = '/global/projecta/projectdirs/lsst/groups/SSim/DC2/minion_1016_desc_dithered_v4.db'
import sqlite3
from sqlite3 import Error
 
def create_connection(db_file):
    try:
        conn = sqlite3.connect(db_file)
        return conn
    except Error as e:
        print(e)
 
    return None
conn = create_connection(db_file)

We are going to check the visits that are close to the zenith and with seeing ~ 0.7”

def get_visits(conn, min_alt=80, min_seeing=0.60, max_seeing=0.8):
    """
    Function to get all visits in the OpSim database higher than certain altitude and within
    a seeing range in r-band.
    """
    cur = conn.cursor()
    cur.execute("SELECT obsHistID FROM ObsHistory WHERE altitude> ? and finSeeing > ? and finSeeing < ? and filter='r'", (np.radians(min_alt), min_seeing, max_seeing))
    return np.array(cur.fetchall()).ravel().astype(np.int)

We need the list of available visits to cross-check with the database

butler = lsst.daf.persistence.Butler(data_imsim)
datarefs = butler.subset('calexp') # This gives us a list with all `calexp`s

Let’s query the visits that have altitude larger than 80 degrees and seeing ~0.7”

good_visits = get_visits(conn)

And check all the visits that have been simulated

all_visits = np.unique(np.fromiter((visitId['visit'] for visitId in datarefs.cache), np.int))

Finally, we check which of the simulated visits are within the range of altitude and seeing that we are looking for.

visits_to_check = all_visits[np.in1d(all_visits, good_visits)]
print(len(good_visits), len(all_visits), len(visits_to_check))

There are 4750 r-band visits in the database in the range of altitude and seeing that we are looking for. We have 1997 visits in our dataset. Only 22 of them are in r-band and the range of seeing that we want. We are going to query some of them using the SingleVisitCatalog class as in DC2_calexp_src_validation.ipynb tutorial. We will select only a few columns.

columns = ['base_ClassificationExtendedness_value', # This is a basic star/galaxy separation flag more on Bosch et al 2018
           'ext_shapeHSM_HsmShapeRegauss_e1', # e1 using GalSim's HSM
           'ext_shapeHSM_HsmShapeRegauss_e2', # e2 using GalSim's HSM
           'base_SdssShape_xx', # xx moment using SDSS algorithm
           'base_SdssShape_xy', # xy moment using SDSS algorithm
           'base_SdssShape_yy', # yy moment using SDSS algorithm
           'base_SdssShape_psf_xx', # xx moment of the PSF in the position of this object using SDSS algorithm
           'base_SdssShape_psf_xy', # xy as above
           'base_SdssShape_psf_yy', # yy as above
           'base_PsfFlux_instFlux', # PSF-flux
           'base_PsfFlux_instFluxErr', # PSF-flux error
           'base_PsfFlux_mag', # magnitude computed in a PSF magnitude (derived from base_PsfFlux_instFlux with the corresponding zeropoint)
           'base_PsfFlux_magErr', # corresponding error to the above magnitude
           'e_psf', # We are going to create this column with the PSF ellipticity below!
]
# We define some routines to convert to the LSST-SRD convention for |e|
def asymQ(ixx,iyy,ixy):
    asymQx = ixx - iyy
    asymQy = 2*ixy
    return np.sqrt(asymQx**2 + asymQy**2)
def trQ(ixx,iyy):
    return ixx+iyy
def get_a(ixx,iyy,ixy):
    return np.sqrt(0.5*(trQ(ixx,iyy)+asymQ(ixx,iyy,ixy)))
def get_b(ixx,iyy,ixy):
    return np.sqrt(0.5*(trQ(ixx,iyy)-asymQ(ixx,iyy,ixy)))
def get_e(ixx,iyy,ixy):
    a = get_a(ixx,iyy,ixy)
    b = get_b(ixx,iyy,ixy)
    return (a**2-b**2)/(a**2+b**2)
nvisits_to_query = 3 # We are going to check 3 visits only but feel free to query up to 22!

data = []
for i, vv in enumerate(visits_to_check):
    cat = SingleVisitCatalog(repo=data_imsim, filter_band='r', visit=vv, detector=None)
    if i==0:
        # Add the derived quantity in the first catalog
        cat.add_derived_quantity('e_psf', get_e, 
                                 'base_SdssShape_psf_xx', 
                                 'base_SdssShape_psf_yy', 
                                 'base_SdssShape_psf_xy')
    data_this = cat.get_quantities(columns, filters=['deblend_nChild == 0'])
    data.append(data_this)
    if i>= nvisits_to_query:
        break

data = {q: np.concatenate([data_this[q] for data_this in data]) for q in data[0]}

So let’s compute the PSF ellipticity in our 3 visits

Show the distribution, the median and 95-th percentile. There are certain criteria for these quantities described in the LSST Science Requirement Document: LPM-17

plt.hist(np.abs(data['e_psf']),histtype='step', range=(0,0.1),bins=100,normed=True)
plt.plot(np.median(data['e_psf'])*np.ones(3),np.linspace(0,65,3),'k--')
plt.plot(np.percentile(data['e_psf'],95)*np.ones(3),np.linspace(0,65,3),'r--')
plt.xlabel(r'$|e|=(1-q^{2})/(1+q^{2})$',fontsize=16)
plt.ylabel(r'$P|e|$',fontsize=16)
#plt.legend(loc='best')
#plt.xlim(0,0.07)
plt.ylim(0,60)

The black line represents the median of the distribution (that should be below 0.04) and the red line represents the 95th percentile (that should be below 0.07). Both criteria are fulfilled! In fact, we increased the ellipticity for run 2.1i to make it more realistic!