Example 2: Realistic data and wrong model
Fit halo mass to shear profile with realistic data and wrong model
the LSST-DESC CLMM team
This notebook demonstrates how to use clmm
to estimate a WL halo
mass from observations of a galaxy cluster. It uses several
functionalities of the support mock_data
module to produce datasets
of increasing complexity. This notebook also demonstrates the bias
introduced on the reconstructed mass by a naive fit, when the redshift
distribution of the background galaxies is not properly accounted for in
the model. Organization of this notebook goes as follows:
Setting things up, with the proper imports.
Generating 3 datasets: an ideal dataset (dataset1) similar to that of Example1 (single source plane); an ideal dataset but with source galaxies following the Chang et al. (2013) redshift distribution (dataset2); a noisy dataset where photoz errors and shape noise are also included (dataset3).
Computing the binned reduced tangential shear profile, for the 3 datasets, using logarithmic binning.
Setting up the “single source plane” model to be fitted to the 3 datasets. Only dataset1 has a single source plane, so we expect to see a bias in the reconstructed mass when using this model on datasets 2 and 3.
Perform a simple fit using
scipy.optimize.curve_fit
and visualize the results.
Setup
First, we import some standard packages.
import clmm
import matplotlib.pyplot as plt
import numpy as np
from numpy import random
from clmm.support.sampler import fitters
clmm.__version__
'1.12.0'
Next, we import clmm
’s core modules.
import clmm.dataops as da
import clmm.galaxycluster as gc
import clmm.theory as theory
from clmm import Cosmology
We then import a support modules for a specific data sets. clmm
includes support modules that enable the user to generate mock data in a
format compatible with clmm
.
from clmm.support import mock_data as mock
Making mock data
For reproducibility:
np.random.seed(11)
To create mock data, we need to define a true cosmology.
mock_cosmo = Cosmology(H0=70.0, Omega_dm0=0.27 - 0.045, Omega_b0=0.045, Omega_k0=0.0)
We now set some parameters for a mock galaxy cluster.
cosmo = mock_cosmo
cluster_m = 1.0e15 # M200,m [Msun]
cluster_z = 0.3
concentration = 4
ngals = 10000
Delta = 200
cluster_ra = 20.0
cluster_dec = 70.0
Then we use the mock_data
support module to generate 3 galaxy
catalogs: - ideal_data
: all background galaxies at the same
redshift. - ideal_data_z
: galaxies distributed according to the
Chang et al. (2013) redshift distribution. - noisy_data_z
:
ideal_data_z
+ photoz errors + shape noise
ideal_data = mock.generate_galaxy_catalog(
cluster_m,
cluster_z,
concentration,
cosmo,
0.8,
ngals=ngals,
cluster_ra=cluster_ra,
cluster_dec=cluster_dec,
)
ideal_data_z = mock.generate_galaxy_catalog(
cluster_m,
cluster_z,
concentration,
cosmo,
"chang13",
ngals=ngals,
cluster_ra=cluster_ra,
cluster_dec=cluster_dec,
)
noisy_data_z = mock.generate_galaxy_catalog(
cluster_m,
cluster_z,
concentration,
cosmo,
"chang13",
shapenoise=0.05,
photoz_sigma_unscaled=0.05,
ngals=ngals,
cluster_ra=cluster_ra,
cluster_dec=cluster_dec,
)
The galaxy catalogs are converted to a clmm.GalaxyCluster
object and
may be saved for later use.
cluster_id = "CL_ideal"
gc_object = clmm.GalaxyCluster(cluster_id, cluster_ra, cluster_dec, cluster_z, ideal_data)
gc_object.save("ideal_GC.pkl")
cluster_id = "CL_ideal_z"
gc_object = clmm.GalaxyCluster(cluster_id, cluster_ra, cluster_dec, cluster_z, ideal_data_z)
gc_object.save("ideal_GC_z.pkl")
cluster_id = "CL_noisy_z"
gc_object = clmm.GalaxyCluster(cluster_id, cluster_ra, cluster_dec, cluster_z, noisy_data_z)
gc_object.save("noisy_GC_z.pkl")
Any saved clmm.GalaxyCluster
object may be read in for analysis.
cl1 = clmm.GalaxyCluster.load("ideal_GC.pkl") # all background galaxies at the same redshift
cl2 = clmm.GalaxyCluster.load(
"ideal_GC_z.pkl"
) # background galaxies distributed according to Chang et al. (2013)
cl3 = clmm.GalaxyCluster.load("noisy_GC_z.pkl") # same as cl2 but with photoz error and shape noise
print("Cluster info = ID:", cl2.unique_id, "; ra:", cl2.ra, "; dec:", cl2.dec, "; z_l :", cl2.z)
print("The number of source galaxies is :", len(cl2.galcat))
Cluster info = ID: CL_ideal_z ; ra: 20.0 ; dec: 70.0 ; z_l : 0.3
The number of source galaxies is : 10000
h = plt.hist(cl2.galcat["z"], bins=50)
![../_images/Example2_Fit_Halo_Mass_to_Shear_Catalog_21_0.png](../_images/Example2_Fit_Halo_Mass_to_Shear_Catalog_21_0.png)
Deriving observables
Computing shear
clmm.dataops.compute_tangential_and_cross_components
calculates the
tangential and cross shears for each source galaxy in the cluster.
theta1, g_t1, g_x1 = cl1.compute_tangential_and_cross_components()
theta2, g_t2, g_x2 = cl2.compute_tangential_and_cross_components()
theta2, g_t3, g_x3 = cl3.compute_tangential_and_cross_components()
Radially binning the data
bin_edges = da.make_bins(0.7, 4, 15, method="evenlog10width")
clmm.dataops.make_radial_profile
evaluates the average shear of the
galaxy catalog in bins of radius.
profile1 = cl1.make_radial_profile("Mpc", bins=bin_edges, cosmo=cosmo)
profile2 = cl2.make_radial_profile("Mpc", bins=bin_edges, cosmo=cosmo)
profile3 = cl3.make_radial_profile("Mpc", bins=bin_edges, cosmo=cosmo)
/global/homes/a/aguena/.local/perlmutter/python-3.11/lib/python3.11/site-packages/clmm-1.12.0-py3.11.egg/clmm/utils/statistic.py:97: RuntimeWarning: invalid value encountered in sqrt
After running clmm.dataops.make_radial_profile
on a
clmm.GalaxyCluster
object, the object acquires the
clmm.GalaxyCluster.profile
attribute.
for n in cl1.profile.colnames:
cl1.profile[n].format = "%6.3e"
cl1.profile.pprint(max_width=-1)
radius_min radius radius_max gt gt_err gx gx_err z z_err n_src W_l
---------- --------- ---------- --------- --------- ---------- --------- --------- --------- --------- ---------
7.000e-01 7.446e-01 7.863e-01 4.359e-02 1.313e-04 -1.791e-17 1.058e-18 8.000e-01 nan 5.800e+01 5.800e+01
7.863e-01 8.403e-01 8.831e-01 3.990e-02 1.101e-04 -1.573e-17 9.259e-19 8.000e-01 6.427e-09 8.600e+01 8.600e+01
8.831e-01 9.395e-01 9.920e-01 3.659e-02 1.003e-04 -1.245e-17 8.502e-19 8.000e-01 nan 9.500e+01 9.500e+01
9.920e-01 1.052e+00 1.114e+00 3.337e-02 8.507e-05 -1.269e-17 6.310e-19 8.000e-01 nan 1.320e+02 1.320e+02
1.114e+00 1.187e+00 1.251e+00 3.007e-02 7.161e-05 -1.018e-17 5.609e-19 8.000e-01 3.321e-09 1.510e+02 1.510e+02
1.251e+00 1.331e+00 1.406e+00 2.711e-02 6.612e-05 -1.023e-17 4.313e-19 8.000e-01 nan 1.670e+02 1.670e+02
1.406e+00 1.495e+00 1.579e+00 2.428e-02 5.252e-05 -8.955e-18 3.321e-19 8.000e-01 3.367e-09 2.350e+02 2.350e+02
1.579e+00 1.677e+00 1.773e+00 2.167e-02 3.915e-05 -8.762e-18 2.340e-19 8.000e-01 nan 3.430e+02 3.430e+02
1.773e+00 1.887e+00 1.992e+00 1.918e-02 3.437e-05 -6.828e-18 2.168e-19 8.000e-01 nan 4.010e+02 4.010e+02
1.992e+00 2.119e+00 2.237e+00 1.694e-02 2.755e-05 -6.239e-18 1.655e-19 8.000e-01 4.245e-09 4.990e+02 4.990e+02
2.237e+00 2.377e+00 2.513e+00 1.490e-02 2.161e-05 -5.460e-18 1.265e-19 8.000e-01 4.009e-09 6.630e+02 6.630e+02
2.513e+00 2.672e+00 2.823e+00 1.302e-02 1.846e-05 -4.757e-18 1.027e-19 8.000e-01 nan 7.800e+02 7.800e+02
2.823e+00 2.999e+00 3.171e+00 1.136e-02 1.422e-05 -4.126e-18 7.797e-20 8.000e-01 nan 1.042e+03 1.042e+03
3.171e+00 3.369e+00 3.561e+00 9.847e-03 1.131e-05 -3.659e-18 5.939e-20 8.000e-01 6.299e-09 1.301e+03 1.301e+03
3.561e+00 3.781e+00 4.000e+00 8.514e-03 8.968e-06 -3.057e-18 4.728e-20 8.000e-01 4.246e-09 1.644e+03 1.644e+03
We visualize the radially binned shear for the 3 configurations
fig = plt.figure(figsize=(10, 6))
fsize = 14
fig.gca().errorbar(
profile1["radius"], profile1["gt"], yerr=profile1["gt_err"], marker="o", label="z_src = 0.8"
)
fig.gca().errorbar(
profile2["radius"],
profile2["gt"],
yerr=profile2["gt_err"],
marker="o",
label="z_src = Chang et al. (2013)",
)
fig.gca().errorbar(
profile3["radius"],
profile3["gt"],
yerr=profile3["gt_err"],
marker="o",
label="z_src = Chang et al. (2013) + photoz err + shape noise",
)
plt.gca().set_title(r"Binned shear of source galaxies", fontsize=fsize)
plt.gca().set_xlabel(r"$r\;[Mpc]$", fontsize=fsize)
plt.gca().set_ylabel(r"$g_t$", fontsize=fsize)
plt.legend()
<matplotlib.legend.Legend at 0x7f8080861b90>
![../_images/Example2_Fit_Halo_Mass_to_Shear_Catalog_33_1.png](../_images/Example2_Fit_Halo_Mass_to_Shear_Catalog_33_1.png)
Create the halo model
# model definition to be used with scipy.optimize.curve_fit
def shear_profile_model(r, logm, z_src):
m = 10.0**logm
gt_model = clmm.compute_reduced_tangential_shear(
r, m, concentration, cluster_z, z_src, cosmo, delta_mdef=200, halo_profile_model="nfw"
)
return gt_model
Fitting a halo mass - highlighting bias when not accounting for the source redshift distribution in the model
We estimate the best-fit mass using scipy.optimize.curve_fit
.
Here, to build the model we make the WRONG assumption that the average
shear in bin \(i\) equals the shear at the average redshift in the
bin; i.e. we assume that
\(\langle g_t\rangle_i = g_t(\langle z\rangle_i)\). This should not
impact cluster 1
as all sources are located at the same redshift.
However, this yields a bias in the reconstructed mass for cluster 2
and cluster 3
, where the sources followed the Chang et al. (2013)
distribution.
# Cluster 1: ideal data
popt1, pcov1 = fitters["curve_fit"](
lambda r, logm: shear_profile_model(r, logm, profile1["z"]),
profile1["radius"],
profile1["gt"],
profile1["gt_err"],
bounds=[13.0, 17.0],
)
# popt1,pcov1 = spo.curve_fit(lambda r, logm:shear_profile_model(r, logm, profile1['z']),
# profile1['radius'],
# profile1['gt'],
# sigma=profile1['gt_err'], bounds=[13.,17.])
m_est1 = 10.0 ** popt1[0]
m_est_err1 = m_est1 * np.sqrt(pcov1[0][0]) * np.log(10) # convert the error on logm to error on m
# Cluster 2: ideal data with redshift distribution
popt2, pcov2 = fitters["curve_fit"](
lambda r, logm: shear_profile_model(r, logm, profile2["z"]),
profile2["radius"],
profile2["gt"],
profile2["gt_err"],
bounds=[13.0, 17.0],
)
m_est2 = 10.0 ** popt2[0]
m_est_err2 = m_est2 * np.sqrt(pcov2[0][0]) * np.log(10) # convert the error on logm to error on m
# Cluster 3: noisy data with redshift distribution
popt3, pcov3 = fitters["curve_fit"](
lambda r, logm: shear_profile_model(r, logm, profile3["z"]),
profile3["radius"],
profile3["gt"],
profile3["gt_err"],
bounds=[13.0, 17.0],
)
m_est3 = 10.0 ** popt3[0]
m_est_err3 = m_est3 * np.sqrt(pcov3[0][0]) * np.log(10) # convert the error on logm to error on m
print(f"Best fit mass for cluster 1 = {m_est1:.2e} +/- {m_est_err1:.2e} Msun")
print(f"Best fit mass for cluster 2 = {m_est2:.2e} +/- {m_est_err2:.2e} Msun")
print(f"Best fit mass for cluster 3 = {m_est3:.2e} +/- {m_est_err3:.2e} Msun")
Best fit mass for cluster 1 = 1.00e+15 +/- 6.10e+11 Msun
Best fit mass for cluster 2 = 8.68e+14 +/- 3.21e+12 Msun
Best fit mass for cluster 3 = 8.87e+14 +/- 3.95e+13 Msun
As expected, the reconstructed mass is biased whenever the sources are not located at a single redshift as this was not accounted for in the model.
Visualization of the results
For visualization purpose, we calculate the reduced tangential shear predicted by the model when using the average redshift of the catalog.
rr = np.logspace(-0.5, np.log10(5), 100)
gt_model1 = clmm.compute_reduced_tangential_shear(
rr,
m_est1,
concentration,
cluster_z,
np.mean(cl1.galcat["z"]),
cosmo,
delta_mdef=200,
halo_profile_model="nfw",
)
gt_model2 = clmm.compute_reduced_tangential_shear(
rr,
m_est2,
concentration,
cluster_z,
np.mean(cl2.galcat["z"]),
cosmo,
delta_mdef=200,
halo_profile_model="nfw",
)
gt_model3 = clmm.compute_reduced_tangential_shear(
rr,
m_est3,
concentration,
cluster_z,
np.mean(cl3.galcat["z"]),
cosmo,
delta_mdef=200,
halo_profile_model="nfw",
)
We visualize that prediction of reduced tangential shear along with the data
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))
axes[0].errorbar(
profile1["radius"],
profile1["gt"],
profile1["gt_err"],
color="red",
label="ideal_data, M_input = %.3e Msun" % cluster_m,
fmt=".",
)
axes[0].plot(
rr,
gt_model1,
color="red",
label="best fit model 1, M_fit = %.2e +/- %.2e" % (m_est1, m_est_err1),
)
axes[0].errorbar(
profile2["radius"],
profile2["gt"],
profile2["gt_err"],
color="green",
label="ideal_data_z, M_input = %.3e Msun" % cluster_m,
fmt=".",
)
axes[0].plot(
rr,
gt_model2,
color="green",
label="best fit model 2, M_fit = %.2e +/- %.2e" % (m_est2, m_est_err2),
)
axes[0].set_title("Ideal data w/wo src redshift distribution", fontsize=fsize)
axes[0].semilogx()
axes[0].semilogy()
axes[0].legend(fontsize=fsize)
axes[0].set_xlabel("R [Mpc]", fontsize=fsize)
axes[0].set_ylabel("reduced tangential shear", fontsize=fsize)
axes[1].errorbar(
profile3["radius"],
profile3["gt"],
profile3["gt_err"],
color="red",
label="noisy_data_z, M_input = %.3e Msun" % cluster_m,
fmt=".",
)
axes[1].plot(
rr,
gt_model3,
color="red",
label="best fit model 3, M_fit = %.2e +/- %.2e" % (m_est3, m_est_err3),
)
axes[1].set_title("Noisy data with src redshift distribution", fontsize=fsize)
axes[1].semilogx()
axes[1].semilogy()
axes[1].legend(fontsize=fsize)
axes[1].set_xlabel("R [Mpc]", fontsize=fsize)
axes[1].set_ylabel("reduced tangential shear", fontsize=fsize)
fig.tight_layout()
![../_images/Example2_Fit_Halo_Mass_to_Shear_Catalog_43_0.png](../_images/Example2_Fit_Halo_Mass_to_Shear_Catalog_43_0.png)