Example 5: Using a real datasets (DES)
Fit halo mass to shear profile using DES data
the LSST-DESC CLMM team
This notebook can be run on NERSC.
Here we demonstrate how to run CLMM on real observational datasets. As an example, we use the data from the Dark Energy Survey (DES) public releases. The catalogs can be accessed from the NOIRLab Astro Data Lab.
The steps in this notebook includes: - Setting things up - Selecting a cluster - Downloading the published catalog at the cluster field - Loading the catalog into CLMM - Running CLMM on the dataset
Acknowledgement
DES data: https://des.ncsa.illinois.edu/thanks
Astro Data Lab: https://datalab.noirlab.edu/acknowledgements.php
## 1. Setup
We import packages.
import numpy as np
import matplotlib.pyplot as plt
from astropy.table import Table
import pickle as pkl
from pathlib import Path
## 2. Selecting a cluster
We use the DES Y1 redMaPPer Catalogs (https://des.ncsa.illinois.edu/releases/y1a1/key-catalogs/key-redmapper) to select a list of high-richness (LAMBDA) galaxy clusters, which likely have high masses.
Name |
RA (deg) |
DEC (deg) |
Z_LAMBDA |
LAMBDA |
---|---|---|---|---|
RMJ025415.5 585710.7 |
43.564574 |
-58.95297 |
0.429804 |
234.50368 |
RMJ051637.4 543001.6 |
79.155704 |
-54.500456 |
0.30416065 |
195.06956 |
RMJ224851.8 443106.3 |
342.215897 |
-44.518403 |
0.3514858 |
178.83827 |
## 3. Downloading the catalog at the cluster field
We consider RMJ051637.4-543001.6 (ACO S520) as an example. We can access
the DES catalog from NOIRLab Data Lab
(https://datalab.noirlab.edu/query.php?name=des_dr1.shape_metacal_riz_unblind).
No registration is required. We make the query and download the catalogs
in “Query Interface”. We use coadd_objects_id
to cross match the
shape catalog and photo-z catalog
(https://datalab.noirlab.edu/query.php?name=des_dr1.photo_z). Since the
cluster is at redshift about 0.3, a radius of 0.3 deg would be about a
radial distance of 5 Mpc. The final catalog includes shape info and
photo-z. Here is an example of the query SQL command. The query could
take a few minutes and the size of the catalog is about 1.4 MB (.csv).
SELECT P.mean_z,
C.ra, C.dec, C.e1, C.e2, C.r11, C.r12, C.r21, C.r22
FROM des_dr1.photo_z as P
INNER JOIN des_dr1.shape_metacal_riz_unblind as C
ON P.coadd_objects_id=C.coadd_objects_id
WHERE 't' = Q3C_RADIAL_QUERY(C.ra, C.dec,79.155704, -54.500456, 0.3)
AND P.minchi2<1
AND P.z_sigma<0.1
AND C.flags_select=0
## 4. Loading the catalog into CLMM
Once we have the catalog, we read in the catalog, make cuts on the catalog, and adjust column names to prepare for the analysis in CLMM.
%%time
# Assume the downloaded catalog is at this path:
filename = "ACOS520_DES.csv"
catalog = filename.replace(".csv", ".pkl")
if not Path(catalog).is_file():
data_0 = Table.read(filename, format="ascii.csv")
pkl.dump(data_0, open(catalog, "wb"))
else:
data_0 = pkl.load(open(catalog, "rb"))
CPU times: user 0 ns, sys: 3.52 ms, total: 3.52 ms
Wall time: 3.12 ms
print(data_0.colnames)
['mean_z', 'ra', 'dec', 'e1', 'e2', 'r11', 'r12', 'r21', 'r22']
Shear response
Shears in the DES data have been measured using the metacal
method
and the catalog provides the shear response terms
(\(r11,r22, r12, r21\)) required to calibrate the shear values.
print(np.mean(data_0["r11"]), np.mean(data_0["r22"]))
print(np.mean(data_0["r12"]), np.mean(data_0["r21"]))
r_diag = np.mean([np.mean(data_0["r11"]), np.mean(data_0["r22"])])
r_off_diag = np.mean([np.mean(data_0["r12"]), np.mean(data_0["r21"])])
print(r_diag, r_off_diag, r_off_diag / r_diag)
0.7105854684953065 0.7070498820143473
-0.017056656529555746 0.008455599747631376
0.7088176752548269 -0.004300528390962185 -0.006067185598068083
The diagonal terms are close to each other. The off-diagonal terms are much smaller (<1%). We use the mean of the diagonal terms to reduce noise. We also skip the selection bias since it is typically at percent level.
# Adjust column names.
def adjust_column_names(catalog_in):
# We consider a map between new and old column names.
# Note we have considered shear calibration here.
column_name_map = {
"ra": "ra",
"dec": "dec",
"z": "mean_z",
"e1": "e1",
"e2": "e2",
}
catalog_out = Table()
for i in column_name_map:
catalog_out[i] = catalog_in[column_name_map[i]]
catalog_out["e1"] /= r_diag
catalog_out["e2"] /= r_diag
return catalog_out
obs_galaxies = adjust_column_names(data_0)
select = obs_galaxies["e1"] ** 2 + obs_galaxies["e2"] ** 2 <= 1.0
print(np.sum(~select))
obs_galaxies = obs_galaxies[select]
0
Basic visualization
def make_plots(catalog_in):
# Scatter plot
plt.figure()
plt.scatter(catalog_in["ra"], catalog_in["dec"], c=catalog_in["z"], s=1.0, alpha=1)
cb = plt.colorbar()
plt.xlabel("ra")
plt.ylabel("dec")
cb.ax.set_title("z")
# Histogram
plt.figure()
plt.hist(catalog_in["z"], bins=20)
plt.xlabel("z")
plt.ylabel("count")
# Relation
plt.figure()
plt.scatter(catalog_in["e1"], catalog_in["e2"], s=1.0, alpha=0.2)
plt.xlabel("e1")
plt.ylabel("e2")
make_plots(obs_galaxies)



## 5. Running CLMM on the dataset We use the functions similar to
examples/Paper_v1.0/gt_and_use_case.ipynb
.
Make a galaxy cluster object
from clmm import Cosmology
import clmm
cosmo = Cosmology(H0=70.0, Omega_dm0=0.27 - 0.045, Omega_b0=0.045, Omega_k0=0.0)
# We consider RMJ051637.4-543001.6 (ACO S520)
cluster_z = 0.30416065 # Cluster redshift
cluster_ra = 79.155704 # Cluster Ra in deg
cluster_dec = -54.500456 # Cluster Dec in deg
# Select background galaxies
obs_galaxies = obs_galaxies[(obs_galaxies["z"] > (cluster_z + 0.1)) & (obs_galaxies["z"] < 1.5)]
obs_galaxies["id"] = np.arange(len(obs_galaxies))
# Put galaxy values on arrays
gal_ra = obs_galaxies["ra"] # Galaxies Ra in deg
gal_dec = obs_galaxies["dec"] # Galaxies Dec in deg
gal_e1 = obs_galaxies["e1"] # Galaxies elipticipy 1
gal_e2 = obs_galaxies["e2"] # Galaxies elipticipy 2
gal_z = obs_galaxies["z"] # Galaxies observed redshift
gal_id = obs_galaxies["id"] # Galaxies ID
# Create a GCData with the galaxies.
galaxies = clmm.GCData(
[gal_ra, gal_dec, gal_e1, gal_e2, gal_z, gal_id], names=["ra", "dec", "e1", "e2", "z", "id"]
)
# Create a GalaxyCluster.
cluster = clmm.GalaxyCluster("Name of cluster", cluster_ra, cluster_dec, cluster_z, galaxies)
Measure the shear profile
import clmm.dataops as da
# Convert ellipticities into shears for the members.
cluster.compute_tangential_and_cross_components()
print(cluster.galcat.colnames)
# Measure profile and add profile table to the cluster.
cluster.make_radial_profile(
bins=da.make_bins(0.2, 5.0, 7, method="evenlog10width"),
bin_units="Mpc",
cosmo=cosmo,
include_empty_bins=False,
gal_ids_in_bins=True,
)
print(cluster.profile.colnames)
['ra', 'dec', 'e1', 'e2', 'z', 'id', 'theta', 'et', 'ex']
['radius_min', 'radius', 'radius_max', 'gt', 'gt_err', 'gx', 'gx_err', 'z', 'z_err', 'n_src', 'W_l', 'gal_id']
fig, ax = plt.subplots(figsize=(7, 5), ncols=1, nrows=1)
errorbar_kwargs = dict(linestyle="", marker="o", markersize=1, elinewidth=0.5, capthick=0.5)
ax.errorbar(
cluster.profile["radius"],
cluster.profile["gt"],
cluster.profile["gt_err"],
c="k",
**errorbar_kwargs
)
ax.set_xlabel("R [Mpc]", fontsize=10)
ax.set_ylabel(r"$g_t$", fontsize=10)
ax.set_xscale("log")
ax.grid(lw=0.3)
ax.minorticks_on()
ax.grid(which="minor", lw=0.1)
plt.show()

Theoretical predictions
from clmm.utils import convert_units
# Model relying on the overall redshift distribution of the sources (WtG III Applegate et al. 2014).
z_inf = 1000
concentration = 4.0
bs_mean = np.mean(clmm.utils.compute_beta_s(cluster.galcat["z"], cluster_z, z_inf, cosmo))
bs2_mean = np.mean(clmm.utils.compute_beta_s(cluster.galcat["z"], cluster_z, z_inf, cosmo) ** 2)
def predict_reduced_tangential_shear_redshift_distribution(profile, logm):
gt = clmm.compute_reduced_tangential_shear(
r_proj=profile["radius"], # Radial component of the profile
mdelta=10**logm, # Mass of the cluster [M_sun]
cdelta=concentration, # Concentration of the cluster
z_cluster=cluster_z, # Redshift of the cluster
z_src=(bs_mean, bs2_mean), # tuple of (bs_mean, bs2_mean)
z_src_info="beta",
approx="order1",
cosmo=cosmo,
delta_mdef=200,
massdef="critical",
halo_profile_model="nfw",
)
return gt
# Model using individual redshift and radial information, to compute the averaged shear in each radial bin, based on the galaxies actually present in that bin.
cluster.galcat["theta_mpc"] = convert_units(
cluster.galcat["theta"], "radians", "mpc", cluster.z, cosmo
)
def predict_reduced_tangential_shear_individual_redshift(profile, logm):
return np.array(
[
np.mean(
clmm.compute_reduced_tangential_shear(
# Radial component of each source galaxy inside the radial bin
r_proj=cluster.galcat[radial_bin["gal_id"]]["theta_mpc"],
mdelta=10**logm, # Mass of the cluster [M_sun]
cdelta=concentration, # Concentration of the cluster
z_cluster=cluster_z, # Redshift of the cluster
# Redshift value of each source galaxy inside the radial bin
z_src=cluster.galcat[radial_bin["gal_id"]]["z"],
cosmo=cosmo,
delta_mdef=200,
massdef="critical",
halo_profile_model="nfw",
)
)
for radial_bin in profile
]
)
Mass fitting
mask_for_fit = cluster.profile["n_src"] > 2
data_for_fit = cluster.profile[mask_for_fit]
from clmm.support.sampler import fitters
def fit_mass(predict_function):
popt, pcov = fitters["curve_fit"](
predict_function,
data_for_fit,
data_for_fit["gt"],
data_for_fit["gt_err"],
bounds=[10.0, 17.0],
)
logm, logm_err = popt[0], np.sqrt(pcov[0][0])
return {
"logm": logm,
"logm_err": logm_err,
"m": 10**logm,
"m_err": (10**logm) * logm_err * np.log(10),
}
%%time
fit_redshift_distribution = fit_mass(predict_reduced_tangential_shear_redshift_distribution)
fit_individual_redshift = fit_mass(predict_reduced_tangential_shear_individual_redshift)
CPU times: user 258 ms, sys: 1.15 ms, total: 259 ms
Wall time: 259 ms
print(
"Best fit mass for N(z) model ="
f' {fit_redshift_distribution["m"]:.3e} +/- {fit_redshift_distribution["m_err"]:.3e} Msun'
)
print(
"Best fit mass for individual redshift and radius ="
f' {fit_individual_redshift["m"]:.3e} +/- {fit_individual_redshift["m_err"]:.3e} Msun'
)
Best fit mass for N(z) model = 3.210e+14 +/- 1.129e+14 Msun
Best fit mass for individual redshift and radius = 5.172e+14 +/- 1.716e+14 Msun
Visualization of the results
def get_predicted_shear(predict_function, fit_values):
gt_est = predict_function(data_for_fit, fit_values["logm"])
gt_est_err = [
predict_function(data_for_fit, fit_values["logm"] + i * fit_values["logm_err"])
for i in (-3, 3)
]
return gt_est, gt_est_err
gt_redshift_distribution, gt_err_redshift_distribution = get_predicted_shear(
predict_reduced_tangential_shear_redshift_distribution, fit_redshift_distribution
)
gt_individual_redshift, gt_err_individual_redshift = get_predicted_shear(
predict_reduced_tangential_shear_individual_redshift, fit_individual_redshift
)
chi2_redshift_distribution_dof = np.sum(
(gt_redshift_distribution - data_for_fit["gt"]) ** 2 / (data_for_fit["gt_err"]) ** 2
) / (len(data_for_fit) - 1)
chi2_individual_redshift_dof = np.sum(
(gt_individual_redshift - data_for_fit["gt"]) ** 2 / (data_for_fit["gt_err"]) ** 2
) / (len(data_for_fit) - 1)
print(f"Reduced chi2 (N(z) model) = {chi2_redshift_distribution_dof}")
print(f"Reduced chi2 (individual (R,z) model) = {chi2_individual_redshift_dof}")
Reduced chi2 (N(z) model) = 4.780165089363783
Reduced chi2 (individual (R,z) model) = 4.411023500984584
fig, axes = plt.subplots(
nrows=2,
ncols=1,
figsize=(8, 6),
gridspec_kw={"height_ratios": [3, 1], "wspace": 0.4, "hspace": 0.03},
)
axes[0].errorbar(
data_for_fit["radius"], data_for_fit["gt"], data_for_fit["gt_err"], c="k", **errorbar_kwargs
)
# Points in grey have not been used for the fit.
axes[0].errorbar(
cluster.profile["radius"][~mask_for_fit],
cluster.profile["gt"][~mask_for_fit],
cluster.profile["gt_err"][~mask_for_fit],
c="grey",
**errorbar_kwargs,
)
pow10 = 14
mlabel = lambda name, fits: (
rf"$M_{{fit}}^{{{name}}} = "
rf'{fits["m"]/10**pow10:.3f}\pm'
rf'{fits["m_err"]/10**pow10:.3f}'
rf"\times 10^{{{pow10}}} M_\odot$"
)
# The model for the 1st method.
axes[0].loglog(
data_for_fit["radius"],
gt_redshift_distribution,
"-C1",
label=mlabel("N(z)", fit_redshift_distribution),
lw=0.5,
)
axes[0].fill_between(
data_for_fit["radius"], *gt_err_redshift_distribution, lw=0, color="C1", alpha=0.2
)
# The model for the 2nd method.
axes[0].loglog(
data_for_fit["radius"],
gt_individual_redshift,
"-C2",
label=mlabel("z,R", fit_individual_redshift),
lw=0.5,
)
axes[0].fill_between(
data_for_fit["radius"], *gt_err_individual_redshift, lw=0, color="C2", alpha=0.2
)
axes[0].set_ylabel(r"$g_t$", fontsize=12)
axes[0].legend(fontsize=12, loc=4)
axes[0].set_xticklabels([])
axes[0].tick_params("x", labelsize=12)
axes[0].tick_params("y", labelsize=12)
axes[1].set_ylim(1.0e-3, 0.5)
errorbar_kwargs2 = {k: v for k, v in errorbar_kwargs.items() if "marker" not in k}
errorbar_kwargs2["markersize"] = 3
errorbar_kwargs2["markeredgewidth"] = 0.5
delta = (cluster.profile["radius"][1] / cluster.profile["radius"][0]) ** 0.15
axes[1].errorbar(
data_for_fit["radius"],
data_for_fit["gt"] / gt_redshift_distribution - 1,
yerr=data_for_fit["gt_err"] / gt_redshift_distribution,
marker="s",
c="C1",
**errorbar_kwargs2,
)
errorbar_kwargs2["markersize"] = 3
errorbar_kwargs2["markeredgewidth"] = 0.5
axes[1].errorbar(
data_for_fit["radius"] * delta,
data_for_fit["gt"] / gt_individual_redshift - 1,
yerr=data_for_fit["gt_err"] / gt_individual_redshift,
marker="*",
c="C2",
**errorbar_kwargs2,
)
axes[1].set_xlabel(r"$R$ [Mpc]", fontsize=12)
axes[1].set_ylabel(r"$g_t^{data}/g_t^{mod.}-1$", fontsize=12)
axes[1].set_xscale("log")
axes[1].set_ylim(-5, 5)
axes[1].tick_params("x", labelsize=12)
axes[1].tick_params("y", labelsize=12)
for ax in axes:
ax.grid(lw=0.3)
ax.minorticks_on()
ax.grid(which="minor", lw=0.1)
plt.show()
# Note since we made cuts on the catalog, the redshift distribution of the remaining sources might not be representative.

References
Zuntz J., Sheldon E., Samuroff S., Troxel M. A., Jarvis M., MacCrann N., Gruen D., et al., 2018, MNRAS, 481, 1149. doi:10.1093/mnras/sty2219
Hoyle B., Gruen D., Bernstein G. M., Rau M. M., De Vicente J., Hartley W. G., Gaztanaga E., et al., 2018, MNRAS, 478, 592. doi:10.1093/mnras/sty957
McClintock T., Varga T. N., Gruen D., Rozo E., Rykoff E. S., Shin T., Melchior P., et al., 2019, MNRAS, 482, 1352. doi:10.1093/mnras/sty2711