Investigating Halo Quantities in the Skysim and CosmoDC2 Extragalactic Catalogs

Investigating Halo Quantities in the Skysim and CosmoDC2 Extragalactic Catalogs#

In this example script we show how to access and interpret the additional halo quantities available in the skysim catalog.

Owner: Patricia Larsen

Last verified run: April 26, 2024 by @patricialarsen

This notebook demonstrates how to access the extra galactic catalog through the Generic Catalog Reader (GCR, yymao/generic-catalog-reader) with a focus on skysim halo quantities. For more assistance with using GCR please see https://github.com/LSSTDESC/gcr-catalogs/blob/master/examples/GCRCatalogs Demo.ipynb

Objectives:

After working through and studying this Notebook you should be able to

Access the cosmoDC2 and skysim extragalactic catalogs through the GCR
Access cosmoDC2 halo quantities and understand how to use them
Access skysim-specific halo quantities and understand how to use them
Match cosmoDC2 and skysim galaxies to their host halos from the Outer Rim simulation
See FAQs for halo quantities

Logistics: This notebook is intended to be run through the JupyterHub NERSC interface available here: https://jupyter.nersc.gov. To setup your NERSC environment, please follow the instructions available here: https://confluence.slac.stanford.edu/display/LSSTDESC/Using+Jupyter+at+NERSC

NOTE: FAQs are listed at the bottom of this page, please look there for known issues before flagging any issues in the halo catalogs

import GCRCatalogs
import numpy as np
from astropy.table import Table
from GCR import GCRQuery
import matplotlib.pyplot as plt
%matplotlib inline

Reading the catalogs#

We load in the catalogs with the “load_catalog” command, and then the values with the “get_quantities” command using filters to select sub-samples of the catalog.

Help for error messages: If this fails to find the appropriate quantities, check that the desc-python kernel is being used and if this is not available source the kernels by running the following command on a terminal at nersc: “source /global/common/software/lsst/common/miniconda/kernels/setup.sh”

Note that we are loading in a small version of the catalogs using the _small versions - these contain the same information as the full catalog but with a smaller sky area.

gc_cdc2 = GCRCatalogs.load_catalog('cosmoDC2_v1.1.4_small')
gc_sky = GCRCatalogs.load_catalog('skysim5000_v1.2_small')

Non-native Quantities#

Let’s first discuss the non-native quantities. These are the halo mass (in Msun) and halo ID (note that you can then also access their positions and velocities through looking at the halo’s central galaxies using an “is_central” flag).

for item in gc_cdc2.list_all_quantities():
    if 'halo' in item:
        print(item)

Native Quantities#

You can access further information from the native_quantities. To understand these we must talk a little about the procedure to add galaxies to the simulation. The base of both Skysim and cosmoDC2 is the Outer Rim simulation, which defines the halo positions on the sky. This Outer Rim halo is referred to in the native quantities as the target halo. This is an FOF halo measured with a linking length of b=0.168, with a central position given by the minimum potential point.

To add galaxies to this simulation, we use a Monte Carlo resampling of galaxies from the Universe Machine simulation. In this, galaxies are assigned to halos from the MultiDark Planck 2 (MDPL2), using the Universe Machine prescription, tuned to observations. This MDPL2 halo is referred to as the host halo.

Briefly, for every host halo in the Outer Rim, we randomly select a host halo in MDPL2 of similar mass, and map the galaxy content of the selected MDPL2 halo into the Outer Rim halo, preserving the halo-centric positions and velocities of the galaxies.

The source halos also pertain to the UMachine catalog, and give the ID and mass values for matching to MDPL2.

We then add extra properties by running the Galacticus semi-analytic model on the Alpha Quadrant (a smaller analog to Outer Rim), and creating a set of galaxies with properly correlated properties which are used as a library from which we select in color and redshift space to add complexity.

Generally speaking for halo properties you’ll want to look at those of the target halo, however the host halo information for the matching halo from the Universe Machine simulation is retained as well. Please look at https://arxiv.org/pdf/1907.06530 if this is confusing, especially Figure 1.

for item in gc_cdc2.list_all_native_quantities():
    if 'target_halo' in item:
        print(item)
print(' ')
for item in gc_cdc2.list_all_native_quantities():
    if 'host_halo' in item:
        print(item)
print(' ')
for item in gc_cdc2.list_all_native_quantities():
    if 'source_halo' in item:
        print(item)

Properties#

The target halo information has

redshift
central potential point x,y,z in comoving Mpc/h
mean halo velocity vx, vy, vz in km/s
halo mass in Msun/h
halo ID -> a cosmoDC2-specific halo ID
fof halo ID -> the fof halo tag corresponding to the Outer-Rim simulation (note this is only unique per simulation step)

The host halo (where the host-centric galaxy positions come from) information has

central position in MDPL2 simulation x,y,z in comoving Mpc/h
mean halo velocity vx, vy, vz in km/s
halo mass in Msun/h

The source halo information has

source halo id (halo tag from the MDPL2 simulation)
halo mass in Msun/h

Notes: You will occassionally find fill values, this happens in several situations, e.g. synthetic field galaxies added without host halos. These are typically either 0 or -1. While GCR quantities are typically converted to GCR conventions, native quantities are not, so please double check units (especially note that GCR masses are in Msun, native quantities are in Msun/h)

# let's add some plots to better understand these here
# start by asking for a representative set of target halo quantities and normal information for a small set of cluster-mass halos
# we set the is_central flag to ensure we only have one galaxy per halo

cluster_data = gc_cdc2.get_quantities(['ra','dec', 'x','vx', 'redshift', 'halo_mass', 'halo_id','baseDC2/target_halo_mass','baseDC2/target_halo_id',
                                      'baseDC2/target_halo_x','baseDC2/target_halo_vx', 'baseDC2/target_halo_redshift','baseDC2/source_halo_mvir',
                                      'baseDC2/host_halo_mvir'], 
                                 filters=['is_central', 'halo_mass > 1e14', 'redshift < 0.2']) 

Firstly let’s check that the masses of the host halos match exactly to the source halos, as these should be duplicate data:

(cluster_data['baseDC2/host_halo_mvir']==cluster_data['baseDC2/source_halo_mvir']).all()

We can then check that the x positions and velocities from the central galaxies line up with the x positions and velocities of the halos

plt.figure()
plt.scatter(cluster_data['x'], cluster_data['baseDC2/target_halo_x'])
plt.plot(np.linspace(45,250,200),np.linspace(45,250,200),'k--')
plt.show()
plt.figure()
plt.scatter(cluster_data['vx'], cluster_data['baseDC2/target_halo_vx'])
plt.plot(np.linspace(-400,800,200),np.linspace(-400,800,200),'k--')
plt.show()

Next let’s look at the redshift - there’s more scatter there because the line-of-sight velocity is also taken into account in the redshift measurement, and the halo redshift is the cosmological redshift (you can get these to match by selecting the cosmological redshift for the galaxies).

The masses line up only if you take into account the change in units between halo_mass (in Msun) and the target_halo_mass (in Msun/h).

h = gc_cdc2.cosmology.H0.value/100.

plt.figure()
plt.scatter(cluster_data['redshift'], cluster_data['baseDC2/target_halo_redshift'])
plt.plot(np.linspace(0.18,0.2,200),np.linspace(0.18,0.2,200),'k--')
plt.xlim([0.18,0.2])
plt.ylim([0.18,0.2])

plt.show()
plt.figure()
plt.scatter(np.log10(cluster_data['halo_mass']), np.log10(cluster_data['baseDC2/target_halo_mass']/h)) # note h factor here
plt.plot(np.linspace(14,15.5,200),np.linspace(14,15.5,200),'k--')
plt.show()

Finally we can see that the halo id matches with the target halo id (note this likely doesn’t match with the fof halo id as the id convention has been altered to ensure unique identifiers)

print((cluster_data['halo_id']==cluster_data['baseDC2/target_halo_id']).all())

Skysim#

We take this one step farther with skysim by addding access to extra quantities of the target halos. There are two types of labels here, sod_halo quantities which are M200c quantities from the Outer Rim simulation, and target_halo quantities which are later derived quantities

sod_halo_mass is the M200c value for the halo (Msun/h)
sod_halo_radius is the R200c value (Mpc/h comoving)
sod_halo_cdelta is the concentration c (unitless)
sod_halo_cdelta_error is the error in the concentration measurement

For the target halo quantities we then add further information on their orientation and ellipcitity. Note that these quantities only exist from the simulation above a certain mass limit, which is 10,000 particles or a mass of 10^13 (not the right number, add here) or above. Below this we orient galaxies according to a random draw from a distribution, and the corresponding values are given instead.

These are

target_halo_axis_A_x,y,z - direction of the halo major axis (eigenvector of the inertia tensor)
target_halo_axis_A,B,C_length - the length of the halo axes in Mpc/h
target_halo_axis_ellipticity,prolaticity -> look at the exact axis ratio to e/p conversion in the github here: LSSTDESC/cosmodc2

*note these are from I believe the simple inertia tensor of the FOF halo particles

for item in gc_sky.list_all_native_quantities():
    if item not in gc_cdc2.list_all_native_quantities():
        if 'target_halo' in item:
            print(item)
print(' ')
for item in gc_sky.list_all_native_quantities():
    if item not in gc_cdc2.list_all_native_quantities():
        if 'sod_halo' in item:
            print(item)

Let’s plot some skysim data

cluster_data = gc_sky.get_quantities(['ra','dec', 'x','vx', 'redshift', 'halo_mass', 'halo_id',
                                       'baseDC2/target_halo_mass','baseDC2/sod_halo_mass','baseDC2/sod_halo_cdelta','baseDC2/sod_halo_radius',
                                      'baseDC2/target_halo_ellipticity','baseDC2/target_halo_prolaticity', 'baseDC2/target_halo_axis_A_length',
                                     'baseDC2/target_halo_axis_B_length', 'baseDC2/target_halo_axis_C_length'], 
                                 filters=['is_central', 'halo_mass > 1e14', 'redshift < 0.2']) 

First let’s look at the standard SOD quantiites - SOD masses should roughly agree with FOF masses, with a fair bit of scatter. The concentrations should roughly have values between 1 and 10 with a large amount of scatter. Fill values for the concentration are -1 for a failed measurement and -101 for a halo which doesn’t have SOD measurements.

h = gc_sky.cosmology.H0.value/100.

plt.figure()
plt.scatter(np.log10(cluster_data['halo_mass']), np.log10(cluster_data['baseDC2/sod_halo_mass']/h) # note h here
plt.plot(np.linspace(14,15.5,200),np.linspace(14,15.5,200),'k--')
plt.show()
plt.figure()
plt.scatter(np.log10(cluster_data['baseDC2/sod_halo_mass']/h), cluster_data['baseDC2/sod_halo_cdelta']) # note h here
plt.show()

Now let’s look at the axis length, this should be on the same order of magnitude as the r200 value

plt.figure()
plt.scatter(cluster_data['baseDC2/sod_halo_radius'],cluster_data['baseDC2/target_halo_axis_A_length']) # note check for h-unit and a questions
plt.plot(np.linspace(0.4,1.5,200),np.linspace(0.4,1.5,200),'k--')
plt.show()

Finally ellipticity and prolaticity should be dependent on the axis ratios - we take the equations exactly from the code to confirm the definitions

def calculate_ellipticity_prolaticity_from_axis_ratios(b, c):
    """
    """
    b = np.atleast_1d(b)
    c = np.atleast_1d(c)
    assert np.all(b > 0), "b must be strictly positive"
    assert np.all(b <= 1), "b cannot exceed unity"
    assert np.all(c > 0), "c must be strictly positive"
    assert np.all(b >= c), "c cannot exceed b"

    lam = 1. + b**2 + c**2
    num = 1. - c**2
    denom = 2*lam
    e = num/denom
    p = (1. - 2*b**2 + c**2)/denom
    return e, p

ax1 = cluster_data['baseDC2/target_halo_axis_C_length']/cluster_data['baseDC2/target_halo_axis_A_length']
ax2 = cluster_data['baseDC2/target_halo_axis_B_length']/cluster_data['baseDC2/target_halo_axis_A_length']
ellip,prolat = calculate_ellipticity_prolaticity_from_axis_ratios(ax2, ax1)

plt.figure()
plt.scatter(cluster_data['baseDC2/target_halo_ellipticity'],ellip) # note check for h-unit and a questions
plt.plot(np.linspace(0.0,0.5,200),np.linspace(0.0,0.5,200),'k--')
plt.show()

We leave it as an exercise to the reader to use and investigate the properties however you’d like, but before we end this tutorial let’s match up the galaxies to their host halos to put it all together

# first we read in both the galaxy data and the halo data (halo data involves cutting on an is_central flag to keep unique halos
# and here we've cut on halo mass and redshift as well to match to specific clusters)

galaxy_data = gc_sky.get_quantities(['ra', 'dec', 'mag_r', 'halo_id'], filters=['mag_r < 19','redshift < 0.25'])
cluster_data = gc_sky.get_quantities(['ra','dec', 'halo_mass', 'halo_id', 'baseDC2/sod_halo_mass','baseDC2/target_halo_ellipticity'], 
                                 filters=['is_central', 'halo_mass > 1e14', 'redshift < 0.2'])

We now plot galaxies in the first three halos, matching using the halo id, as in other tutorials, but this time also printing the sod halo mass

cluster_data = Table(cluster_data)
for i, cluster in enumerate(cluster_data):
    if (i >= 3):
        break # plot only the first 3
    members = GCRQuery('halo_id == {}'.format(cluster['halo_id'])).filter(galaxy_data)
    plt.figure()
    plt.scatter(
        members['ra'], 
        members['dec'], 
        s=(19-members['mag_r'])*8, 
        label='Galaxy Members [{}]'.format(len(members['ra']))
    )
    plt.plot(cluster['ra'], cluster['dec'], 'xr', label='Cluster Center')
    plt.legend(loc='best', framealpha=0.3)
    plt.xlabel(r'ra [deg]')
    plt.ylabel(r'dec [deg]')
    plt.title('Halo ID:  {}\nHalo Mass:  {:.2e} Msun\nSOD halo mass: {:.2e} Msun/h'.format(cluster['halo_id'], cluster['halo_mass'], cluster['baseDC2/sod_halo_mass']))
plt.show()

FAQs - CosmoDC2#

Known issues/ limitations

The color distribution has some discreteness?
- This is a limitation of the sampling method, causing some redshift-related discreteness. See the paper for more details
The red sequence looks too strong?
- This had to be amplified for redmapper to run successfully. This has complicated selection functions and we couldn’t both maintain full realism and meet the requirement for redmapper to run with the person-power available. If you need more realistic colors please consider using the more recent Roman-Rubin simulations
I found some odd duplicate clusters?
- There was a bug in cosmoDC2 where ranks disagreed as to the ownership of some fragment halos, this led to a very small number of doubled-up halos in the map. Since this exists in the image simulation we chose not to fix this bug in cosmoDC2 for consistency. To find this you could cut on halo mass and then see if you have two halos of the same mass with different IDs in the same location and exclude those from your sample, but the recommended solution if you don’t need images is to use skysim to look at the halo catalog which doesn’t have this issue and has a much larger sky area so more clusters to look at.

IDS

Why are there negative target_halo_ids?
- Negative target_halo_ids exist when the merger-tree object associated with the halo is a fragment halo, this happens when we want to distinguish fly-by type events from true mergers. You can extract the FOF halo tag from this fragment ID.
- The target_halo_fof_halo_id value for these fragment objects is assigned to be 0
Why does the distribution of target_halo_ids look weird?
- the fragment IDs use bit masking to combine the FOF halo id with a fragment index, this results in a slightly strange looking distribution for fragment IDs.
Why is my source_halo_id -1 or 0?
- These are fill values, there is no source halo from the Umachine catalog. This could be a synthetic source added to amplify the cluster red sequence, or a background unclustered galaxy added to adjust the number density (in this case target_halo_id may also be -1 or 0).

Masses

Why are the halo masses different to the target halo masses?
- See above notes on units, the halo mass is in Msun, and the target halo mass is in Msun/h (h=0.71 for this simulation)

Simulation

Where can I find information on the cosmology/ other details of the Outer Rim simulation? Or of cosmoDC2?
- The best place to look is the cosmoDC2 paper, found here: https://arxiv.org/pdf/1907.06530
- More information can be found in the Outer Rim paper: https://arxiv.org/abs/1904.11970
Can I get the base simulation data?
- Yes, a subset of the data can be found here https://cosmology.alcf.anl.gov/ this includes a sample of snapshot data, and lightcone data including halo catalogs

FAQs - Skysim#

What is the difference between CosmoDC2 and skysim?
- Skysim is a more recent and much larger extragalactic catalog, incorporating improvements and bug fixes to cosmoDC2 using the same or similar base methods and simulations. These include higher resolution weak lensing, fixes to the lensing deflection angles, color model improvements, triaxiality, and additional halo information.