Example 1: Ideal data

Fit halo mass to shear profile with ideal data

the LSST-DESC CLMM team

This notebook demonstrates how to use clmm to estimate a WL halo mass of a galaxy cluster in the ideal case: i) all galaxies on a single source plane, ii) no redshift errors, iii) no shape noise. The steps below correspond to: - Setting things up, with the proper imports. - Generating a ideal mock dataset. - Computing the binned reduced tangential shear profile, for two different binning scheme. - Setting up the model to be fitted to the data. - Perform a simple fit using scipy.optimize.basinhopping and visualize the results.

Setup

First, we import some standard packages.

import numpy as np
import matplotlib.pyplot as plt

from numpy import random

Next, we import clmm’s core modules.

import clmm
import clmm.dataops as da
import clmm.galaxycluster as gc
import clmm.theory as theory
from clmm import Cosmology

Make sure we know which version we’re using

clmm.__version__

'1.12.0'

We then import a support modules for a specific data sets. clmm includes support modules that enable the user to generate mock data in a format compatible with clmm. We also provide support modules for processing other specific data sets for use with clmm. Any existing support module can be used as a template for creating a new support module for another data set. If you do make such a support module, please do consider making a pull request so we can add it for others to use.

Making mock data

from clmm.support import mock_data as mock

np.random.seed(11)

To create mock data, we need to define a true cosmology.

mock_cosmo = Cosmology(H0=70.0, Omega_dm0=0.27 - 0.045, Omega_b0=0.045, Omega_k0=0.0)

We now set some parameters for a mock galaxy cluster.

cosmo = mock_cosmo
cluster_id = "Awesome_cluster"
cluster_m = 1.0e15  # M200,m [Msun]
cluster_z = 0.3
src_z = 0.8
concentration = 4
ngals = 10000
cluster_ra = 20.0
cluster_dec = 30.0

Then we use the mock_data support module to generate a new galaxy catalog.

ideal_data = mock.generate_galaxy_catalog(
    cluster_m,
    cluster_z,
    concentration,
    cosmo,
    src_z,
    ngals=ngals,
    cluster_ra=cluster_ra,
    cluster_dec=cluster_dec,
)

This galaxy catalog is then converted to a clmm.GalaxyCluster object.

gc_object = clmm.GalaxyCluster(cluster_id, cluster_ra, cluster_dec, cluster_z, ideal_data)

A clmm.GalaxyCluster object can be pickled and saved for later use.

gc_object.save("mock_GC.pkl")

Any saved clmm.GalaxyCluster object may be read in for analysis.

cl = clmm.GalaxyCluster.load("mock_GC.pkl")
print("Cluster info = ID:", cl.unique_id, "; ra:", cl.ra, "; dec:", cl.dec, "; z_l :", cl.z)
print("The number of source galaxies is :", len(cl.galcat))

ra_l = cl.ra
dec_l = cl.dec
z = cl.z
e1 = cl.galcat["e1"]
e2 = cl.galcat["e2"]
ra_s = cl.galcat["ra"]
dec_s = cl.galcat["dec"]
z_s = cl.galcat["z"]

Cluster info = ID: Awesome_cluster ; ra: 20.0 ; dec: 30.0 ; z_l : 0.3
The number of source galaxies is : 10000

We can visualize the distribution of galaxies on the sky.

fsize = 15

fig = plt.figure(figsize=(10, 6))
hb = fig.gca().hexbin(ra_s, dec_s, gridsize=50)

cb = fig.colorbar(hb)
cb.set_label("Number of sources in bin", fontsize=fsize)

plt.gca().set_xlabel(r"$\Delta RA$", fontsize=fsize)
plt.gca().set_ylabel(r"$\Delta Dec$", fontsize=fsize)
plt.gca().set_title("Source Galaxies", fontsize=fsize)

plt.show()

../_images/Example1_Fit_Halo_Mass_to_Shear_Catalog_25_0.png

clmm separates cosmology-dependent and cosmology-independent functionality.

Deriving observables

We first demonstrate a few of the procedures one can perform on data without assuming a cosmology.

Computing shear

clmm.dataops.compute_tangential_and_cross_components calculates the tangential and cross shears for each source galaxy in the cluster.

theta, g_t, g_x = da.compute_tangential_and_cross_components(ra_l, dec_l, ra_s, dec_s, e1, e2)

We can visualize the shear field at each galaxy location.

fig = plt.figure(figsize=(10, 6))

fig.gca().loglog(theta, g_t, ".")
plt.ylabel("reduced shear", fontsize=fsize)
plt.xlabel("angular distance [rad]", fontsize=fsize)

Text(0.5, 0, 'angular distance [rad]')

../_images/Example1_Fit_Halo_Mass_to_Shear_Catalog_32_1.png

Radially binning the data

Here we compare the reconstructed mass under two different bin definitions.

Note binning would cause fitted mass to be slightly larger than input mass. The reason is that g(r), the tangential reduced shear along cluster radius, is a convex function – the function value after binning would be larger, but the bias becomes smaller as bin number increases.

bin_edges1 = da.make_bins(0.01, 3.7, 50)
bin_edges2 = da.make_bins(0.01, 3.7, 10)

clmm.dataops.make_radial_profile evaluates the average shear of the galaxy catalog in bins of radius.

res1 = da.make_radial_profile(
    [g_t, g_x, z_s],
    theta,
    "radians",
    "Mpc",
    bins=bin_edges1,
    cosmo=cosmo,
    z_lens=z,
    include_empty_bins=False,
)
res2 = da.make_radial_profile(
    [g_t, g_x, z_s],
    theta,
    "radians",
    "Mpc",
    bins=bin_edges2,
    cosmo=cosmo,
    z_lens=z,
    include_empty_bins=False,
)

/global/homes/a/aguena/.local/perlmutter/python-3.11/lib/python3.11/site-packages/clmm-1.12.0-py3.11.egg/clmm/utils/statistic.py:97: RuntimeWarning: invalid value encountered in sqrt
/global/homes/a/aguena/.local/perlmutter/python-3.11/lib/python3.11/site-packages/clmm-1.12.0-py3.11.egg/clmm/utils/statistic.py:97: RuntimeWarning: invalid value encountered in sqrt

Note that we set include_empty_bins=False explicitly here even though it is the default behavior. Setting the argument to True would also return empty bins (that is, bins with at most one data point in them), which would have to be excluded manually when fitting, though it might be useful e.g., when combining datasets. To clarify the behavior, consider the following comparison:

res_with_empty = da.make_radial_profile(
    [g_t, g_x, z_s],
    theta,
    "radians",
    "Mpc",
    bins=1000,
    cosmo=cosmo,
    z_lens=z,
    include_empty_bins=True,
)
# this is the default behavior
res_without_empty = da.make_radial_profile(
    [g_t, g_x, z_s],
    theta,
    "radians",
    "Mpc",
    bins=1000,
    cosmo=cosmo,
    z_lens=z,
    include_empty_bins=False,
)
res_with_empty["n_src"].size, res_without_empty["n_src"].size

/global/homes/a/aguena/.local/perlmutter/python-3.11/lib/python3.11/site-packages/clmm-1.12.0-py3.11.egg/clmm/utils/statistic.py:97: RuntimeWarning: invalid value encountered in sqrt
/global/homes/a/aguena/.local/perlmutter/python-3.11/lib/python3.11/site-packages/clmm-1.12.0-py3.11.egg/clmm/utils/statistic.py:97: RuntimeWarning: invalid value encountered in sqrt
/global/homes/a/aguena/.local/perlmutter/python-3.11/lib/python3.11/site-packages/clmm-1.12.0-py3.11.egg/clmm/utils/statistic.py:97: RuntimeWarning: invalid value encountered in sqrt
/global/homes/a/aguena/.local/perlmutter/python-3.11/lib/python3.11/site-packages/clmm-1.12.0-py3.11.egg/clmm/utils/statistic.py:97: RuntimeWarning: invalid value encountered in sqrt

(1000, 892)

i.e., 108 bins have fewer than two sources in them and are excluded by default (when setting the random seed to 11).

For later use, we’ll define some variables for the binned radius and tangential shear.

gt_profile1 = res1["p_0"]
r1 = res1["radius"]
z1 = res1["p_2"]

gt_profile2 = res2["p_0"]
r2 = res2["radius"]
z2 = res2["p_2"]

We visualize the radially binned shear for our mock galaxies.

fig = plt.figure(figsize=(10, 6))

fig.gca().loglog(r1, gt_profile1, ".", label="50 bins")
fig.gca().loglog(r2, gt_profile2, "+", markersize=15, label="10 bins")
plt.legend(fontsize=fsize)

plt.gca().set_title(r"Binned shear of source galaxies", fontsize=fsize)
plt.gca().set_xlabel(r"$r\;[Mpc]$", fontsize=fsize)
plt.gca().set_ylabel(r"$g_t$", fontsize=fsize)

Text(0, 0.5, '$g_t$')

../_images/Example1_Fit_Halo_Mass_to_Shear_Catalog_44_1.png

You can also run make_radial_profile direct on a clmm.GalaxyCluster object.

cl.compute_tangential_and_cross_components()  # You need to add the shear components first
cl.make_radial_profile("Mpc", bins=1000, cosmo=cosmo, include_empty_bins=False)
pass

/global/homes/a/aguena/.local/perlmutter/python-3.11/lib/python3.11/site-packages/clmm-1.12.0-py3.11.egg/clmm/utils/statistic.py:97: RuntimeWarning: invalid value encountered in sqrt
/global/homes/a/aguena/.local/perlmutter/python-3.11/lib/python3.11/site-packages/clmm-1.12.0-py3.11.egg/clmm/utils/statistic.py:97: RuntimeWarning: invalid value encountered in sqrt

After running clmm.GalaxyCluster.make_radial_profile object, the object acquires the clmm.GalaxyCluster.profile attribute.

for n in cl.profile.colnames:
    cl.profile[n].format = "%6.3e"
cl.profile.pprint(max_width=-1)

radius_min   radius  radius_max     gt      gt_err      gx       gx_err      z       z_err     n_src      W_l
---------- --------- ---------- --------- --------- ---------- --------- --------- --------- --------- ---------
929e-02 2.182e-02  2.490e-02 1.699e-01 4.854e-03 -8.478e-17 1.073e-18 8.000e-01 0.000e+00 2.000e+00 2.000e+00
172e-02 4.415e-02  4.733e-02 1.354e-01 7.376e-05 -6.592e-17 2.453e-18 8.000e-01 0.000e+00 2.000e+00 2.000e+00
855e-02 6.252e-02  6.416e-02 1.224e-01 2.610e-04 -5.898e-17 2.453e-18 8.000e-01 0.000e+00 2.000e+00 2.000e+00
595e-01 1.611e-01  1.651e-01 9.224e-02 7.363e-05 -4.163e-17 4.907e-18 8.000e-01 0.000e+00 2.000e+00 2.000e+00
931e-01 1.937e-01  1.987e-01 8.657e-02 3.080e-05 -3.990e-17 1.227e-18 8.000e-01 0.000e+00 2.000e+00 2.000e+00
100e-01 2.141e-01  2.156e-01 8.349e-02 9.282e-05 -1.735e-17 1.227e-17 8.000e-01 0.000e+00 2.000e+00 2.000e+00
156e-01 2.186e-01  2.212e-01 8.283e-02 1.114e-04 -4.077e-17 4.089e-19 8.000e-01 0.000e+00 3.000e+00 3.000e+00
324e-01 2.357e-01  2.380e-01 8.049e-02 8.085e-05 -1.973e-17 1.395e-17 8.000e-01 0.000e+00 2.000e+00 2.000e+00
604e-01 2.644e-01  2.660e-01 7.687e-02 1.141e-05 -2.017e-17 1.181e-17 8.000e-01 0.000e+00 2.000e+00 2.000e+00
773e-01 2.800e-01  2.829e-01 7.505e-02 4.587e-05 -1.041e-17 9.342e-18 8.000e-01 0.000e+00 4.000e+00 4.000e+00
885e-01 2.898e-01  2.941e-01 7.396e-02 4.115e-05 -3.426e-17 3.756e-19 8.000e-01 0.000e+00 4.000e+00 4.000e+00
053e-01 3.084e-01  3.109e-01 7.197e-02 9.003e-05 -2.125e-17 1.012e-17 8.000e-01 0.000e+00 2.000e+00 2.000e+00
109e-01 3.134e-01  3.165e-01 7.146e-02 2.563e-05 -2.353e-17 8.827e-18 8.000e-01 0.000e+00 4.000e+00 4.000e+00
389e-01 3.407e-01  3.446e-01 6.877e-02 8.012e-05 -3.441e-17 2.361e-19 8.000e-01 0.000e+00 3.000e+00 3.000e+00
614e-01 3.642e-01  3.670e-01 6.661e-02 9.981e-05 -3.383e-17 6.133e-19 8.000e-01 0.000e+00 2.000e+00 2.000e+00
726e-01 3.750e-01  3.782e-01 6.567e-02 1.123e-06 -3.469e-17 4.907e-18 8.000e-01 0.000e+00 2.000e+00 2.000e+00
782e-01 3.795e-01  3.838e-01 6.528e-02 4.830e-05 -1.583e-17 8.815e-18 8.000e-01 0.000e+00 4.000e+00 4.000e+00
894e-01 3.924e-01  3.950e-01 6.419e-02 4.180e-05 -1.561e-17 1.104e-17 8.000e-01 0.000e+00 2.000e+00 2.000e+00
950e-01 3.975e-01  4.006e-01 6.378e-02 7.287e-05 -2.429e-17 8.498e-18 8.000e-01 0.000e+00 3.000e+00 3.000e+00
062e-01 4.072e-01  4.118e-01 6.299e-02 2.972e-05 -1.626e-17 6.784e-18 8.000e-01 0.000e+00 4.000e+00 4.000e+00
175e-01 4.201e-01  4.231e-01 6.197e-02 6.136e-05 -2.949e-17 1.227e-18 8.000e-01 0.000e+00 2.000e+00 2.000e+00
287e-01 4.320e-01  4.343e-01 6.106e-02 2.670e-05 -1.475e-17 1.043e-17 8.000e-01 0.000e+00 2.000e+00 2.000e+00
399e-01 4.434e-01  4.455e-01 6.022e-02 5.868e-05 -2.147e-17 6.205e-18 8.000e-01 0.000e+00 4.000e+00 4.000e+00
       ...       ...        ...       ...       ...        ...       ...       ...       ...       ...       ...
296e+00 5.301e+00  5.302e+00 5.438e-03 4.602e-07 -9.035e-19 7.377e-19 8.000e-01 0.000e+00 3.000e+00 3.000e+00
302e+00 5.304e+00  5.308e+00 5.434e-03 9.193e-07 -2.656e-18 1.917e-20 8.000e-01 0.000e+00 4.000e+00 4.000e+00
308e+00 5.311e+00  5.313e+00 5.424e-03 7.930e-07 -2.645e-18 1.644e-20 8.000e-01       nan 5.000e+00 5.000e+00
313e+00 5.316e+00  5.319e+00 5.417e-03 1.115e-06 -1.753e-18 7.157e-19 8.000e-01 0.000e+00 3.000e+00 3.000e+00
319e+00 5.322e+00  5.324e+00 5.409e-03 1.845e-06 -1.355e-18 9.583e-19 8.000e-01 0.000e+00 2.000e+00 2.000e+00
324e+00 5.328e+00  5.330e+00 5.401e-03 1.143e-06 -2.103e-18 4.711e-19 8.000e-01       nan 5.000e+00 5.000e+00
330e+00 5.331e+00  5.336e+00 5.396e-03 1.009e-06 -1.764e-18 6.981e-19 8.000e-01 0.000e+00 3.000e+00 3.000e+00
352e+00 5.355e+00  5.358e+00 5.364e-03 1.619e-06 -2.683e-18 1.917e-20 8.000e-01 0.000e+00 2.000e+00 2.000e+00
358e+00 5.360e+00  5.364e+00 5.356e-03 6.537e-07 -1.269e-18 5.335e-19 8.000e-01 6.083e-09 6.000e+00 6.000e+00
369e+00 5.371e+00  5.375e+00 5.341e-03 1.780e-06 -1.301e-18 9.200e-19 8.000e-01 0.000e+00 2.000e+00 2.000e+00
375e+00 5.379e+00  5.380e+00 5.331e-03 1.503e-06 -2.602e-18 0.000e+00 8.000e-01 0.000e+00 2.000e+00 2.000e+00
380e+00 5.382e+00  5.386e+00 5.326e-03 1.690e-06 -2.606e-18 2.995e-21 8.000e-01 0.000e+00 2.000e+00 2.000e+00
386e+00 5.390e+00  5.392e+00 5.316e-03 1.169e-06 -2.602e-18 0.000e+00 8.000e-01 0.000e+00 2.000e+00 2.000e+00
397e+00 5.399e+00  5.403e+00 5.304e-03 7.402e-07 -8.583e-19 7.119e-19 8.000e-01 0.000e+00 3.000e+00 3.000e+00
420e+00 5.423e+00  5.425e+00 5.272e-03 1.025e-06 -1.714e-18 7.110e-19 8.000e-01 0.000e+00 3.000e+00 3.000e+00
425e+00 5.430e+00  5.431e+00 5.263e-03 1.139e-06 -1.274e-18 9.008e-19 8.000e-01 0.000e+00 2.000e+00 2.000e+00
437e+00 5.439e+00  5.442e+00 5.250e-03 9.064e-07 -2.602e-18 3.320e-20 8.000e-01 0.000e+00 4.000e+00 4.000e+00
448e+00 5.451e+00  5.453e+00 5.234e-03 6.452e-07 -1.941e-18 5.449e-19 8.000e-01 0.000e+00 4.000e+00 4.000e+00
459e+00 5.461e+00  5.465e+00 5.222e-03 1.022e-06 -1.274e-18 9.008e-19 8.000e-01 0.000e+00 2.000e+00 2.000e+00
470e+00 5.471e+00  5.476e+00 5.208e-03 8.019e-07 -1.281e-18 9.056e-19 8.000e-01 0.000e+00 2.000e+00 2.000e+00
504e+00 5.508e+00  5.509e+00 5.161e-03 1.050e-06 -8.324e-19 6.893e-19 8.000e-01 0.000e+00 3.000e+00 3.000e+00
509e+00 5.511e+00  5.515e+00 5.156e-03 7.879e-07 -2.507e-18 9.583e-21 8.000e-01 0.000e+00 2.000e+00 2.000e+00
616e+00 5.621e+00  5.622e+00 5.017e-03 2.007e-08 -2.462e-18 2.995e-21 8.000e-01 0.000e+00 2.000e+00 2.000e+00
Length = 892 rows

Modeling the data

We next demonstrate a few of the procedures one can perform once a cosmology has been chosen.

Choosing a halo model

Note: some samplers might pass arrays of size 1 to variables being minimized. This can cause problems as clmm functions type check the input. In this function below this is corrected by converting the mass to a float: m = float(10.**logm)

def nfw_to_shear_profile(logm, profile_info):
    [r, gt_profile, z_src_rbin] = profile_info
    m = float(10.0**logm)
    gt_model = clmm.compute_reduced_tangential_shear(
        r, m, concentration, cluster_z, z_src_rbin, cosmo, delta_mdef=200, halo_profile_model="nfw"
    )
    return np.sum((gt_model - gt_profile) ** 2)

Fitting a halo mass

We optimize to find the best-fit mass for the data under the two radial binning schemes.

from clmm.support.sampler import samplers

Note: The samplers[‘minimize’] is a local optimization function, so it does not guarantee to give consistent results (logm_est1 and logm_est2) for all logm_0, which is dependent on the np.random.seed(#) set previously. Some choices of np.random.seed(#) (e.g. set # to 1) will output a logm_0 that leads to a much larger logm_est and thus a misbehaving best-fit model. In contrast, the samplers[‘basinhopping’] is a global optimization function, which can give stable results regardless of the initial guesses, logm_0. Since its underlying method is the same as the samplers[‘minimize’], the two functions take the same arguments.

logm_0 = random.uniform(13.0, 17.0, 1)[0]
# logm_est1 = samplers['minimize'](nfw_to_shear_profile, logm_0,args=[r1, gt_profile1, z1])[0]
logm_est1 = samplers["basinhopping"](
    nfw_to_shear_profile, logm_0, minimizer_kwargs={"args": ([r1, gt_profile1, z1])}
)[0]

# logm_est2 = samplers['minimize'](nfw_to_shear_profile, logm_0,args=[r2, gt_profile2, z2])[0]
logm_est2 = samplers["basinhopping"](
    nfw_to_shear_profile, logm_0, minimizer_kwargs={"args": ([r2, gt_profile2, z2])}
)[0]

m_est1 = 10.0**logm_est1
m_est2 = 10.0**logm_est2

print((m_est1, m_est2))

/tmp/ipykernel_2160586/3502670224.py:3: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  m = float(10.0**logm)

(1019206578091240.5, 1060937922400428.9)

Next, we calculate the reduced tangential shear predicted by the two models.

rr = np.logspace(-2, np.log10(5), 100)

gt_model1 = clmm.compute_reduced_tangential_shear(
    rr, m_est1, concentration, cluster_z, src_z, cosmo, delta_mdef=200, halo_profile_model="nfw"
)

gt_model2 = clmm.compute_reduced_tangential_shear(
    rr, m_est2, concentration, cluster_z, src_z, cosmo, delta_mdef=200, halo_profile_model="nfw"
)

We visualize the two predictions of reduced tangential shear.

fig = plt.figure(figsize=(10, 6))

fig.gca().scatter(
    r1, gt_profile1, color="orange", label="binned mock data 1, M_input = %.3e Msun" % cluster_m
)
fig.gca().plot(rr, gt_model1, color="orange", label="best fit model 1, M_fit = %.3e" % m_est1)

fig.gca().scatter(
    r2,
    gt_profile2,
    color="blue",
    alpha=0.5,
    label="binned mock data 2, M_input = %.3e Msun" % cluster_m,
)
fig.gca().plot(
    rr,
    gt_model2,
    color="blue",
    linestyle="--",
    alpha=0.5,
    label="best fit model 2, M_fit = %.3e" % m_est2,
)

plt.semilogx()
plt.semilogy()

plt.legend()
plt.xlabel("R [Mpc]", fontsize=fsize)
plt.ylabel("reduced tangential shear", fontsize=fsize)

Text(0, 0.5, 'reduced tangential shear')

../_images/Example1_Fit_Halo_Mass_to_Shear_Catalog_60_1.png