Example 2: Realistic data and wrong model ========================================= Fit halo mass to shear profile with realistic data and wrong model ------------------------------------------------------------------ *the LSST-DESC CLMM team* This notebook demonstrates how to use ``clmm`` to estimate a WL halo mass from observations of a galaxy cluster. It uses several functionalities of the support ``mock_data`` module to produce datasets of increasing complexity. This notebook also demonstrates the bias introduced on the reconstructed mass by a naive fit, when the redshift distribution of the background galaxies is not properly accounted for in the model. Organization of this notebook goes as follows: - Setting things up, with the proper imports. - Generating 3 datasets: an ideal dataset (dataset1) similar to that of Example1 (single source plane); an ideal dataset but with source galaxies following the Chang et al. (2013) redshift distribution (dataset2); a noisy dataset where photoz errors and shape noise are also included (dataset3). - Computing the binned reduced tangential shear profile, for the 3 datasets, using logarithmic binning. - Setting up the “single source plane” model to be fitted to the 3 datasets. Only dataset1 has a single source plane, so we expect to see a bias in the reconstructed mass when using this model on datasets 2 and 3. - Perform a simple fit using ``scipy.optimize.curve_fit`` and visualize the results. Setup ----- First, we import some standard packages. .. code:: ipython3 import clmm import matplotlib.pyplot as plt import numpy as np from numpy import random from clmm.support.sampler import fitters clmm.__version__ .. parsed-literal:: '1.12.0' Next, we import ``clmm``\ ’s core modules. .. code:: ipython3 import clmm.dataops as da import clmm.galaxycluster as gc import clmm.theory as theory from clmm import Cosmology We then import a support modules for a specific data sets. ``clmm`` includes support modules that enable the user to generate mock data in a format compatible with ``clmm``. .. code:: ipython3 from clmm.support import mock_data as mock Making mock data ---------------- For reproducibility: .. code:: ipython3 np.random.seed(11) To create mock data, we need to define a true cosmology. .. code:: ipython3 mock_cosmo = Cosmology(H0=70.0, Omega_dm0=0.27 - 0.045, Omega_b0=0.045, Omega_k0=0.0) We now set some parameters for a mock galaxy cluster. .. code:: ipython3 cosmo = mock_cosmo cluster_m = 1.0e15 # M200,m [Msun] cluster_z = 0.3 concentration = 4 ngals = 10000 Delta = 200 cluster_ra = 20.0 cluster_dec = 70.0 Then we use the ``mock_data`` support module to generate 3 galaxy catalogs: - ``ideal_data``: all background galaxies at the same redshift. - ``ideal_data_z``: galaxies distributed according to the Chang et al. (2013) redshift distribution. - ``noisy_data_z``: ``ideal_data_z`` + photoz errors + shape noise .. code:: ipython3 ideal_data = mock.generate_galaxy_catalog( cluster_m, cluster_z, concentration, cosmo, 0.8, ngals=ngals, cluster_ra=cluster_ra, cluster_dec=cluster_dec, ) ideal_data_z = mock.generate_galaxy_catalog( cluster_m, cluster_z, concentration, cosmo, "chang13", ngals=ngals, cluster_ra=cluster_ra, cluster_dec=cluster_dec, ) noisy_data_z = mock.generate_galaxy_catalog( cluster_m, cluster_z, concentration, cosmo, "chang13", shapenoise=0.05, photoz_sigma_unscaled=0.05, ngals=ngals, cluster_ra=cluster_ra, cluster_dec=cluster_dec, ) The galaxy catalogs are converted to a ``clmm.GalaxyCluster`` object and may be saved for later use. .. code:: ipython3 cluster_id = "CL_ideal" gc_object = clmm.GalaxyCluster(cluster_id, cluster_ra, cluster_dec, cluster_z, ideal_data) gc_object.save("ideal_GC.pkl") cluster_id = "CL_ideal_z" gc_object = clmm.GalaxyCluster(cluster_id, cluster_ra, cluster_dec, cluster_z, ideal_data_z) gc_object.save("ideal_GC_z.pkl") cluster_id = "CL_noisy_z" gc_object = clmm.GalaxyCluster(cluster_id, cluster_ra, cluster_dec, cluster_z, noisy_data_z) gc_object.save("noisy_GC_z.pkl") Any saved ``clmm.GalaxyCluster`` object may be read in for analysis. .. code:: ipython3 cl1 = clmm.GalaxyCluster.load("ideal_GC.pkl") # all background galaxies at the same redshift cl2 = clmm.GalaxyCluster.load( "ideal_GC_z.pkl" ) # background galaxies distributed according to Chang et al. (2013) cl3 = clmm.GalaxyCluster.load("noisy_GC_z.pkl") # same as cl2 but with photoz error and shape noise print("Cluster info = ID:", cl2.unique_id, "; ra:", cl2.ra, "; dec:", cl2.dec, "; z_l :", cl2.z) print("The number of source galaxies is :", len(cl2.galcat)) .. parsed-literal:: Cluster info = ID: CL_ideal_z ; ra: 20.0 ; dec: 70.0 ; z_l : 0.3 The number of source galaxies is : 10000 .. code:: ipython3 h = plt.hist(cl2.galcat["z"], bins=50) .. image:: Example2_Fit_Halo_Mass_to_Shear_Catalog_files/Example2_Fit_Halo_Mass_to_Shear_Catalog_21_0.png Deriving observables -------------------- Computing shear ~~~~~~~~~~~~~~~ ``clmm.dataops.compute_tangential_and_cross_components`` calculates the tangential and cross shears for each source galaxy in the cluster. .. code:: ipython3 theta1, g_t1, g_x1 = cl1.compute_tangential_and_cross_components() theta2, g_t2, g_x2 = cl2.compute_tangential_and_cross_components() theta2, g_t3, g_x3 = cl3.compute_tangential_and_cross_components() Radially binning the data ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 bin_edges = da.make_bins(0.7, 4, 15, method="evenlog10width") ``clmm.dataops.make_radial_profile`` evaluates the average shear of the galaxy catalog in bins of radius. .. code:: ipython3 profile1 = cl1.make_radial_profile("Mpc", bins=bin_edges, cosmo=cosmo) profile2 = cl2.make_radial_profile("Mpc", bins=bin_edges, cosmo=cosmo) profile3 = cl3.make_radial_profile("Mpc", bins=bin_edges, cosmo=cosmo) .. parsed-literal:: /global/homes/a/aguena/.local/perlmutter/python-3.11/lib/python3.11/site-packages/clmm-1.12.0-py3.11.egg/clmm/utils/statistic.py:97: RuntimeWarning: invalid value encountered in sqrt After running ``clmm.dataops.make_radial_profile`` on a ``clmm.GalaxyCluster`` object, the object acquires the ``clmm.GalaxyCluster.profile`` attribute. .. code:: ipython3 for n in cl1.profile.colnames: cl1.profile[n].format = "%6.3e" cl1.profile.pprint(max_width=-1) .. parsed-literal:: radius_min radius radius_max gt gt_err gx gx_err z z_err n_src W_l ---------- --------- ---------- --------- --------- ---------- --------- --------- --------- --------- --------- 7.000e-01 7.446e-01 7.863e-01 4.359e-02 1.313e-04 -1.791e-17 1.058e-18 8.000e-01 nan 5.800e+01 5.800e+01 7.863e-01 8.403e-01 8.831e-01 3.990e-02 1.101e-04 -1.573e-17 9.259e-19 8.000e-01 6.427e-09 8.600e+01 8.600e+01 8.831e-01 9.395e-01 9.920e-01 3.659e-02 1.003e-04 -1.245e-17 8.502e-19 8.000e-01 nan 9.500e+01 9.500e+01 9.920e-01 1.052e+00 1.114e+00 3.337e-02 8.507e-05 -1.269e-17 6.310e-19 8.000e-01 nan 1.320e+02 1.320e+02 1.114e+00 1.187e+00 1.251e+00 3.007e-02 7.161e-05 -1.018e-17 5.609e-19 8.000e-01 3.321e-09 1.510e+02 1.510e+02 1.251e+00 1.331e+00 1.406e+00 2.711e-02 6.612e-05 -1.023e-17 4.313e-19 8.000e-01 nan 1.670e+02 1.670e+02 1.406e+00 1.495e+00 1.579e+00 2.428e-02 5.252e-05 -8.955e-18 3.321e-19 8.000e-01 3.367e-09 2.350e+02 2.350e+02 1.579e+00 1.677e+00 1.773e+00 2.167e-02 3.915e-05 -8.762e-18 2.340e-19 8.000e-01 nan 3.430e+02 3.430e+02 1.773e+00 1.887e+00 1.992e+00 1.918e-02 3.437e-05 -6.828e-18 2.168e-19 8.000e-01 nan 4.010e+02 4.010e+02 1.992e+00 2.119e+00 2.237e+00 1.694e-02 2.755e-05 -6.239e-18 1.655e-19 8.000e-01 4.245e-09 4.990e+02 4.990e+02 2.237e+00 2.377e+00 2.513e+00 1.490e-02 2.161e-05 -5.460e-18 1.265e-19 8.000e-01 4.009e-09 6.630e+02 6.630e+02 2.513e+00 2.672e+00 2.823e+00 1.302e-02 1.846e-05 -4.757e-18 1.027e-19 8.000e-01 nan 7.800e+02 7.800e+02 2.823e+00 2.999e+00 3.171e+00 1.136e-02 1.422e-05 -4.126e-18 7.797e-20 8.000e-01 nan 1.042e+03 1.042e+03 3.171e+00 3.369e+00 3.561e+00 9.847e-03 1.131e-05 -3.659e-18 5.939e-20 8.000e-01 6.299e-09 1.301e+03 1.301e+03 3.561e+00 3.781e+00 4.000e+00 8.514e-03 8.968e-06 -3.057e-18 4.728e-20 8.000e-01 4.246e-09 1.644e+03 1.644e+03 We visualize the radially binned shear for the 3 configurations .. code:: ipython3 fig = plt.figure(figsize=(10, 6)) fsize = 14 fig.gca().errorbar( profile1["radius"], profile1["gt"], yerr=profile1["gt_err"], marker="o", label="z_src = 0.8" ) fig.gca().errorbar( profile2["radius"], profile2["gt"], yerr=profile2["gt_err"], marker="o", label="z_src = Chang et al. (2013)", ) fig.gca().errorbar( profile3["radius"], profile3["gt"], yerr=profile3["gt_err"], marker="o", label="z_src = Chang et al. (2013) + photoz err + shape noise", ) plt.gca().set_title(r"Binned shear of source galaxies", fontsize=fsize) plt.gca().set_xlabel(r"$r\;[Mpc]$", fontsize=fsize) plt.gca().set_ylabel(r"$g_t$", fontsize=fsize) plt.legend() .. parsed-literal:: .. image:: Example2_Fit_Halo_Mass_to_Shear_Catalog_files/Example2_Fit_Halo_Mass_to_Shear_Catalog_33_1.png Create the halo model --------------------- .. code:: ipython3 # model definition to be used with scipy.optimize.curve_fit def shear_profile_model(r, logm, z_src): m = 10.0**logm gt_model = clmm.compute_reduced_tangential_shear( r, m, concentration, cluster_z, z_src, cosmo, delta_mdef=200, halo_profile_model="nfw" ) return gt_model Fitting a halo mass - highlighting bias when not accounting for the source redshift distribution in the model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We estimate the best-fit mass using ``scipy.optimize.curve_fit``. Here, to build the model we make the WRONG assumption that the average shear in bin :math:`i` equals the shear at the average redshift in the bin; i.e. we assume that :math:`\langle g_t\rangle_i = g_t(\langle z\rangle_i)`. This should not impact ``cluster 1`` as all sources are located at the same redshift. However, this yields a bias in the reconstructed mass for ``cluster 2`` and ``cluster 3``, where the sources followed the Chang et al. (2013) distribution. .. code:: ipython3 # Cluster 1: ideal data popt1, pcov1 = fitters["curve_fit"]( lambda r, logm: shear_profile_model(r, logm, profile1["z"]), profile1["radius"], profile1["gt"], profile1["gt_err"], bounds=[13.0, 17.0], ) # popt1,pcov1 = spo.curve_fit(lambda r, logm:shear_profile_model(r, logm, profile1['z']), # profile1['radius'], # profile1['gt'], # sigma=profile1['gt_err'], bounds=[13.,17.]) m_est1 = 10.0 ** popt1[0] m_est_err1 = m_est1 * np.sqrt(pcov1[0][0]) * np.log(10) # convert the error on logm to error on m # Cluster 2: ideal data with redshift distribution popt2, pcov2 = fitters["curve_fit"]( lambda r, logm: shear_profile_model(r, logm, profile2["z"]), profile2["radius"], profile2["gt"], profile2["gt_err"], bounds=[13.0, 17.0], ) m_est2 = 10.0 ** popt2[0] m_est_err2 = m_est2 * np.sqrt(pcov2[0][0]) * np.log(10) # convert the error on logm to error on m # Cluster 3: noisy data with redshift distribution popt3, pcov3 = fitters["curve_fit"]( lambda r, logm: shear_profile_model(r, logm, profile3["z"]), profile3["radius"], profile3["gt"], profile3["gt_err"], bounds=[13.0, 17.0], ) m_est3 = 10.0 ** popt3[0] m_est_err3 = m_est3 * np.sqrt(pcov3[0][0]) * np.log(10) # convert the error on logm to error on m print(f"Best fit mass for cluster 1 = {m_est1:.2e} +/- {m_est_err1:.2e} Msun") print(f"Best fit mass for cluster 2 = {m_est2:.2e} +/- {m_est_err2:.2e} Msun") print(f"Best fit mass for cluster 3 = {m_est3:.2e} +/- {m_est_err3:.2e} Msun") .. parsed-literal:: Best fit mass for cluster 1 = 1.00e+15 +/- 6.10e+11 Msun Best fit mass for cluster 2 = 8.68e+14 +/- 3.21e+12 Msun Best fit mass for cluster 3 = 8.87e+14 +/- 3.95e+13 Msun As expected, the reconstructed mass is biased whenever the sources are not located at a single redshift as this was not accounted for in the model. Visualization of the results ---------------------------- For visualization purpose, we calculate the reduced tangential shear predicted by the model when using the average redshift of the catalog. .. code:: ipython3 rr = np.logspace(-0.5, np.log10(5), 100) gt_model1 = clmm.compute_reduced_tangential_shear( rr, m_est1, concentration, cluster_z, np.mean(cl1.galcat["z"]), cosmo, delta_mdef=200, halo_profile_model="nfw", ) gt_model2 = clmm.compute_reduced_tangential_shear( rr, m_est2, concentration, cluster_z, np.mean(cl2.galcat["z"]), cosmo, delta_mdef=200, halo_profile_model="nfw", ) gt_model3 = clmm.compute_reduced_tangential_shear( rr, m_est3, concentration, cluster_z, np.mean(cl3.galcat["z"]), cosmo, delta_mdef=200, halo_profile_model="nfw", ) We visualize that prediction of reduced tangential shear along with the data .. code:: ipython3 fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6)) axes[0].errorbar( profile1["radius"], profile1["gt"], profile1["gt_err"], color="red", label="ideal_data, M_input = %.3e Msun" % cluster_m, fmt=".", ) axes[0].plot( rr, gt_model1, color="red", label="best fit model 1, M_fit = %.2e +/- %.2e" % (m_est1, m_est_err1), ) axes[0].errorbar( profile2["radius"], profile2["gt"], profile2["gt_err"], color="green", label="ideal_data_z, M_input = %.3e Msun" % cluster_m, fmt=".", ) axes[0].plot( rr, gt_model2, color="green", label="best fit model 2, M_fit = %.2e +/- %.2e" % (m_est2, m_est_err2), ) axes[0].set_title("Ideal data w/wo src redshift distribution", fontsize=fsize) axes[0].semilogx() axes[0].semilogy() axes[0].legend(fontsize=fsize) axes[0].set_xlabel("R [Mpc]", fontsize=fsize) axes[0].set_ylabel("reduced tangential shear", fontsize=fsize) axes[1].errorbar( profile3["radius"], profile3["gt"], profile3["gt_err"], color="red", label="noisy_data_z, M_input = %.3e Msun" % cluster_m, fmt=".", ) axes[1].plot( rr, gt_model3, color="red", label="best fit model 3, M_fit = %.2e +/- %.2e" % (m_est3, m_est_err3), ) axes[1].set_title("Noisy data with src redshift distribution", fontsize=fsize) axes[1].semilogx() axes[1].semilogy() axes[1].legend(fontsize=fsize) axes[1].set_xlabel("R [Mpc]", fontsize=fsize) axes[1].set_ylabel("reduced tangential shear", fontsize=fsize) fig.tight_layout() .. image:: Example2_Fit_Halo_Mass_to_Shear_Catalog_files/Example2_Fit_Halo_Mass_to_Shear_Catalog_43_0.png