Table of Contents .. raw:: html

.. raw:: html

   <h1>

Table of Contents

.. raw:: html

   </h1>

.. container:: toc

   .. raw:: html

      <ul class="toc-item">

   .. raw:: html

      <li>

   1  Generate random data and add to catalog

   .. raw:: html

      </li>

   .. raw:: html

      <li>

   2  Match catalogs

   .. raw:: html

      </li>

   .. raw:: html

      <li>

   3  Find the scaling relation between the masses of the two catalogs

   .. raw:: html

      </li>

   .. raw:: html

      <li>

   4  Recovery rate

   .. raw:: html

      <ul class="toc-item">

   .. raw:: html

      <li>

   4.1  Panels plots

   .. raw:: html

      </li>

   .. raw:: html

      </ul>

   .. raw:: html

      </li>

   .. raw:: html

      <li>

   5  Difference with threshold mass

   .. raw:: html

      </li>

   .. raw:: html

      <li>

   6  Difference catalog mass

   .. raw:: html

      </li>

   .. raw:: html

      </ul>

Intrinsic Completeness
======================

When two catalogs with different mass proxies and a threshold mass
proxy, there is a intrinsic completeness limitaion on the relative
completeness of the catalogs.

We can associate the mass proxy relation of two catalog with:

.. math::

    P(M_2|M_1)\frac{dn}{dM_1}=P(M_1|M_2)\frac{dn}{dM_2},


where :math:`P(M_1|M_2)` is the probability of having a catalog 1
cluster with mass :math:`M_1` given a catalog 2 cluster with mass
:math:`M_2`, and :math:`\frac{dn}{dM_1}` is the mass function of catalog
1. Then the number of catalog 1 cluster with mass :math:`M_1` can be
computed as:

.. math::

    n(M_1)=\int_{-\infty}^{\infty}dM_2 \frac{dn}{dM_2}P(M_1|M_2).


However, if catalog 2 has a cut given by some threshold
:math:`M_2^{th}`, the number of clusters we can expect to measure is:

.. math::

    n_{M_2^{th}}(M_1)=\int_{M_2^{th}}^{\infty}dM_2 \frac{dn}{dM_2}P(M_1|M_2),


and the relative completeness resulting is:

.. math::

    c_{M_2^{th}}(M_1)=
    \frac{n_{M_2^{th}}(M_1)}{n(M_1)}=
    \frac
    {\int_{M_2^{th}}^{\infty}dM_2 \frac{dn}{dM_2}P(M_1|M_2)}
    {\int_{-\infty}^{\infty}dM_2 \frac{dn}{dM_2}P(M_1|M_2)}.


We can see that this is just the integral of :math:`P(M_1|M_2)` weighted
by the mass function :math:`\frac{dn}{dM_2}`. In an approximation where
this factor presents minor effect (see the `appendix <#approx>`__), we
can estimate this completeness as:

.. math::

    c_{M_2^{th}}(M_1)=
    \frac
    {\int_{M_2^{th}}^{\infty}dM_2 P(M_1|M_2)}
    {\int_{-\infty}^{\infty}dM_2 P(M_1|M_2)}.


This is the estimation that can be done in ``clevar``.

.. code:: ipython3

    %load_ext autoreload
    %autoreload 2
    
    import numpy as np
    import pylab as plt
    from IPython.display import Markdown as md

Generate random data and add to catalog
---------------------------------------

.. code:: ipython3

    # For reproducibility
    np.random.seed(1)

.. code:: ipython3

    from support import gen_cluster
    logm_scatter = 0.1
    input1, input2 = gen_cluster(lnm_scatter=logm_scatter*np.log(10))


.. parsed-literal::

    Initial number of clusters (logM>12.70): 1,925
    Clusters in catalog1: 908
    Clusters in catalog2: 945


.. code:: ipython3

    from clevar import ClCatalog
    c1 = ClCatalog('Cat1', ra=input1['RA'], dec=input1['DEC'], z=input1['Z'], mass=input1['MASS'],
                mass_err=input1['MASS_ERR'], z_err=input1['Z_ERR'])
    c2 = ClCatalog('Cat2', ra=input2['RA'], dec=input2['DEC'], z=input2['Z'], mass=input2['MASS'],
                mass_err=input2['MASS_ERR'], z_err=input2['Z_ERR'])
    # Format for nice display
    for c in ('ra', 'dec', 'z', 'z_err'):
        c1[c].info.format = '.2f'
        c2[c].info.format = '.2f'
    for c in ('mass', 'mass_err'):
        c1[c].info.format = '.2e'
        c2[c].info.format = '.2e'


.. parsed-literal::

    /home/aguena/.local/lib/python3.9/site-packages/clevar-0.13.2-py3.9.egg/clevar/catalog.py:267: UserWarning: id column missing, additional one is being created.
      warnings.warn(


Match catalogs
--------------

.. code:: ipython3

    from clevar.match import ProximityMatch
    from clevar.cosmology import AstroPyCosmology
    
    match_config = {
        'type': 'cross', # options are cross, cat1, cat2
        'which_radius': 'max', # Case of radius to be used, can be: cat1, cat2, min, max
        'preference': 'angular_proximity', # options are more_massive, angular_proximity or redshift_proximity
        'catalog1': {'delta_z':.2,
                    'match_radius': '10 arcsec'
                    },
        'catalog2': {'delta_z':.2,
                    'match_radius': '10 arcsec'
                    }
    }
    
    mt = ProximityMatch()
    mt.match_from_config(c1, c2, match_config)


.. parsed-literal::

    
    ## ClCatalog 1
    ## Prep mt_cols
    * zmin|zmax from config value
    * ang radius from set scale
    
    ## ClCatalog 2
    ## Prep mt_cols
    * zmin|zmax from config value
    * ang radius from set scale
    
    ## Multiple match (catalog 1)
    Finding candidates (Cat1)
    * 822/908 objects matched.
    
    ## Multiple match (catalog 2)
    Finding candidates (Cat2)
    * 822/945 objects matched.
    
    ## Finding unique matches of catalog 1
    Unique Matches (Cat1)
    * 822/908 objects matched.
    
    ## Finding unique matches of catalog 2
    Unique Matches (Cat2)
    * 822/945 objects matched.
    Cross Matches (Cat1)
    * 822/908 objects matched.
    Cross Matches (Cat2)
    * 822/945 objects matched.


Find the scaling relation between the masses of the two catalogs
----------------------------------------------------------------

.. code:: ipython3

    from clevar.match_metrics import scaling

.. code:: ipython3

    zbins = np.linspace(0, 2, 21)
    mbins = np.logspace(13, 14, 5)

.. code:: ipython3

    info2_mr = scaling.mass_density_metrics(
        c2, c1, 'cross', ax_rotation=45,
        add_err=False, plt_kwargs={'s':5},
        mask1=c2['mass']>2e13,
        add_fit=True, fit_bins1=20, fit_bins2=10, fit_err2=None,
        bins1=20)


.. image:: intrinsic_completeness_files/intrinsic_completeness_12_0.png


The output of ``scaling.mass*`` functions contains a
``['fit']['func_dist_interp']`` element representing :math:`P(M_1|M_2)`,
which can be passed directly to the recovery rates to estimate the
intrinsic completeness.

Recovery rate
-------------

Compute recovery rates, they are computed in mass and redshift bins. To
add the intrinsic completeness, also pass the following arguments: \*
``p_m1_m2``: :math:`P(M_1|M_2)` - Probability of having a catalog 1
cluster with mass :math:`M_1` given a catalog 2 cluster with mass
:math:`M_2`. \* ``min_mass2``: Minimum mass (proxy) of the other
catalog.

.. code:: ipython3

    from clevar.match_metrics import recovery

When plotted as a function of redshift with mass bins, it will add
shaded reagions corresponding to the completeness of each mass bin:

.. code:: ipython3

    zbins = np.linspace(0, 2, 5)
    mbins = np.append(np.logspace(13, 13.4, 4), 1e15)
    info = recovery.plot(c1, 'cross', zbins, mbins, 
                         p_m1_m2=info2_mr['fit']['func_dist_interp'], min_mass2=1e13)
    plt.show()


.. image:: intrinsic_completeness_files/intrinsic_completeness_17_0.png


When plotted as a function of mass with redshift bins, it will add a
shaded region correspong to the intrinsic completeness as a function of
mass:

.. code:: ipython3

    zbins = np.linspace(0, 2, 5)
    mbins = np.logspace(13, 14, 20)
    info = recovery.plot(
        c1, 'cross', zbins, mbins,
        shape='line', transpose=True,
        p_m1_m2=info2_mr['fit']['func_dist_interp'], min_mass2=1e13)


.. image:: intrinsic_completeness_files/intrinsic_completeness_19_0.png


In this case specifically, we do know the true distribution
:math:`P(M_2|M_1)` that was used by ``gen_cluster`` to generate the
clusters. So we can compare these estimated results with the “true”
intrinsic completeness:

.. code:: ipython3

    p_m1_m2 = lambda m1, m2: np.exp(-0.5*(np.log10(m2/m1))**2/logm_scatter**2)/np.sqrt(2*np.pi*logm_scatter**2)

.. code:: ipython3

    # By redshift
    zbins = np.linspace(0, 2, 5)
    mbins = np.append(np.logspace(13, 13.4, 4), 1e15)
    info = recovery.plot(c1, 'cross', zbins, mbins, 
                         p_m1_m2=p_m1_m2, min_mass2=1e13)
    plt.show()
    
    # By mass
    zbins = np.linspace(0, 2, 5)
    mbins = np.logspace(13, 14, 20)
    info = recovery.plot(
        c1, 'cross', zbins, mbins,
        shape='line', transpose=True,
        p_m1_m2=p_m1_m2, min_mass2=1e13)


.. image:: intrinsic_completeness_files/intrinsic_completeness_22_0.png


.. image:: intrinsic_completeness_files/intrinsic_completeness_22_1.png


.. code:: ipython3

    # By redshift
    zbins = np.linspace(0, 2, 5)
    mbins = np.append(np.logspace(13, 13.4, 4), 1e15)
    info = recovery.plot(c1, 'cross', zbins, mbins, 
                         p_m1_m2=p_m1_m2, min_mass2=1e13)
    plt.show()
    
    # By mass
    zbins = np.linspace(0, 2, 5)
    mbins = np.logspace(13, 14, 20)
    info = recovery.plot(
        c1, 'cross', zbins, mbins,
        shape='line', transpose=True,
        p_m1_m2=p_m1_m2, min_mass2=1e13)


.. image:: intrinsic_completeness_files/intrinsic_completeness_23_0.png


.. image:: intrinsic_completeness_files/intrinsic_completeness_23_1.png


Panels plots
~~~~~~~~~~~~

You can also have a panel for each bin:

.. code:: ipython3

    zbins = np.linspace(0, 2, 21)
    mbins = np.append(np.logspace(13, 13.4, 4), 1e15)
    info = recovery.plot_panel(c1, 'cross', zbins, mbins,
                               p_m1_m2=p_m1_m2, min_mass2=1e13)
    
    zbins = np.linspace(0, 2, 5)
    mbins = np.logspace(13, 14, 20)
    info = recovery.plot_panel(c1, 'cross', zbins, mbins, transpose=True,
                              p_m1_m2=p_m1_m2, min_mass2=1e13)


.. image:: intrinsic_completeness_files/intrinsic_completeness_25_0.png


.. image:: intrinsic_completeness_files/intrinsic_completeness_25_1.png


Appendix: Estimating deviation in completeness approximation 
============================================================

Here we can compute the deviation of disregarding the mass function
factor in the intrinsic completeness estimation, i. e. comparing:

.. math::

    c(M)=
    \frac
    {\int_{M_2^{th}}^{\infty}dM_2 \frac{dn}{dM_2}P(M_1|M_2)}
    {\int_{-\infty}^{\infty}dM_2 \frac{dn}{dM_2}P(M_1|M_2)},


with

.. math::

    c_{approx}(M_1)=
    \frac
    {\int_{M_2^{th}}^{\infty}dM_2 P(M_1|M_2)}
    {\int_{-\infty}^{\infty}dM_2 P(M_1|M_2)}.


.. code:: ipython3

    # Mass function approximation, derived from DC2 simulation:
    dn_dlogm = lambda x: 10**np.poly1d([ -0.4102,   9.6586, -52.4729])(x)
    
    # Gaussian distribution for logm
    p_m1_m2 = lambda m1, m2: np.exp(-0.5*(np.log10(m2/m1))**2/logm_scatter**2)/np.sqrt(2*np.pi*logm_scatter**2)

.. code:: ipython3

    from scipy.integrate import quad_vec
    
    # Limits for integrals
    min_log_mass2_norm = 10
    max_log_mass2 = 16
    
    # Normalized integral
    def normalized_integral(m1, m2_th, func):
        integ = quad_vec(func, np.log10(m2_th), max_log_mass2, epsabs=1e-50)[0]
        norm = quad_vec(func, min_log_mass2_norm, max_log_mass2, epsabs=1e-50)[0]
        if hasattr(integ, '__len__'):
            integ[norm>0] /= norm[norm>0]
        else:
            integ = integ/norm if norm>0 else integ
        return integ
    
    # Intrinsic completenss
    comp_full = lambda m1, m2_th: normalized_integral(
        m1, m2_th, lambda logm2: p_m1_m2(m1, 10**logm2)*dn_dlogm(logm2))
    
    # Intrinsic completeness disregarding mass function
    comp_approx = lambda m1, m2_th: normalized_integral(
        m1, m2_th, lambda logm2: p_m1_m2(m1, 10**logm2))

We can estimate the difference as a function of :math:`M_1` or
:math:`M_2^{th}`. We will see both below.

Difference with threshold mass
------------------------------

Here we want to see the difference of this approximations as a function
of :math:`M_2^{th}`.

.. code:: ipython3

    m1 = 1e13

.. code:: ipython3

    md(f'We start by fixing $M_1$ at $10^{{{np.log10(m1):.1f}}}$'
      r' and evaluating the difference of the arguments $\frac{dn}{dM_2}P(M_1|M_2)$ vs $P(M_1|M_2)$:')


We start by fixing :math:`M_1` at :math:`10^{13.0}` and evaluating the
difference of the arguments :math:`\frac{dn}{dM_2}P(M_1|M_2)` vs
:math:`P(M_1|M_2)`:


.. code:: ipython3

    m2_vec = np.logspace(12, 15, 500)
    norm1 = quad_vec(lambda logm2: p_m1_m2(m1, 10**logm2)*dn_dlogm(logm2),
        min_log_mass2_norm, max_log_mass2, epsabs=1e-50)[0]
    norm2 = quad_vec(lambda logm2: p_m1_m2(m1, 10**logm2),
        min_log_mass2_norm, max_log_mass2, epsabs=1e-50)[0]
    
    plt.plot(m2_vec, p_m1_m2(m1, m2_vec)*dn_dlogm(np.log10(m2_vec))/norm1,
             label=r'$\frac{dn}{dM_2}P(M_1|M_2)$')
    plt.plot(m2_vec, p_m1_m2(m1, m2_vec)/norm2, label='$P(M_1|M_2)$')
    
    
    plt.axvline(m1, ls='--', color='r')
    plt.text(m1, 0, '$M_1$', color='r')
    
    plt.xscale('log')
    plt.xlabel('$M_2$')
    plt.ylabel('kernel')
    plt.legend(fontsize=16)
    plt.show()


.. image:: intrinsic_completeness_files/intrinsic_completeness_33_0.png


There is only a small shift in the argument to be integrated.

When making the integrals:

.. code:: ipython3

    m2_th_vec = np.logspace(12, 14, 100)
    
    int_full_m1fix = np.array([comp_full(m1, m2) for m2 in m2_th_vec])
    int_approx_m1fix = np.array([comp_approx(m1, m2) for m2 in m2_th_vec])

and checking the difference:

.. code:: ipython3

    f, axes = plt.subplots(2, figsize=(6, 8), sharex=True)
    
    axes[0].plot(m2_th_vec, int_full_m1fix, label=r'$\frac{dn}{dM_2}P(M_2|M_1)$')
    axes[0].plot(m2_th_vec, int_approx_m1fix, label='$P(M_2|M_1)$')
    
    diff_m1fix = int_approx_m1fix-int_full_m1fix
    axes[1].plot(m2_th_vec, diff_m1fix)
    
    axes[0].legend(fontsize=16)
    axes[0].set_ylabel('$c_{M_2^{th}}(M_1=10^{13})$')
    axes[1].set_ylabel('$difference$')
    axes[1].set_xlabel('$M_2^{th}$')
    
    
    for ax in axes:
        ax.set_xscale('log')
        ax.axvline(m1, ls='--', color='r')
        ax.text(m1, 0, '$M_1$', color='r')
    
    plt.show()


.. image:: intrinsic_completeness_files/intrinsic_completeness_37_0.png


.. code:: ipython3

    md(f'the maximum difference is of {diff_m1fix.max():.4f}. '
       f'Therefore the approximation only deviates by {diff_m1fix.max()*100:.0f}%,'
       ' and it occurs at $M_2^{th}$=$M_1$')


the maximum difference is of 0.0910. Therefore the approximation only
deviates by 9%, and it occurs at :math:`M_2^{th}`\ =\ :math:`M_1`


Difference catalog mass
-----------------------

Here we want to see the difference of this approximations as a function
of :math:`M_1`.

.. code:: ipython3

    m2_th = 1e13

.. code:: ipython3

    md(f'We start by setting $M_2^{{th}}$ at $10^{{{np.log10(m2_th):.1f}}}$'
      r' and evaluating the difference of the arguments $\frac{dn}{dM_2}P(M_1|M_2)$ vs $P(M_1|M_2)$:')


We start by setting :math:`M_2^{th}` at :math:`10^{13.0}` and evaluating
the difference of the arguments :math:`\frac{dn}{dM_2}P(M_1|M_2)` vs
:math:`P(M_1|M_2)`:


.. code:: ipython3

    m1_vec = np.logspace(12, 14, 201)
    
    int_full_m2fix = comp_full(m1_vec, m2_th)
    int_approx_m2fix = comp_approx(m1_vec, m2_th)

.. code:: ipython3

    f, axes = plt.subplots(2, figsize=(6, 8), sharex=True)
    
    axes[0].plot(m1_vec, int_full_m2fix, label=r'$\frac{dn}{dM_2}P(M_1|M_2)$')
    axes[0].plot(m1_vec, int_approx_m2fix, label='$P(M_1|M_2)$')
    
    diff_m2fix = int_approx_m2fix-int_full_m2fix
    axes[1].plot(m1_vec, diff_m2fix)
    
    axes[0].legend(fontsize=16)
    axes[0].set_ylabel('$c_{M_2^{th}=10^{13}}(M_1)$')
    axes[1].set_ylabel('$difference$')
    axes[1].set_xlabel('$M_1$')
    for ax in axes:
        ax.set_xscale('log')
        ax.axvline(m2_th, ls='--', color='r')
        ax.text(m2_th, 0, '$M_2^{th}$', color='r')
    
    plt.show()


.. image:: intrinsic_completeness_files/intrinsic_completeness_43_0.png


.. code:: ipython3

    md(f'The maximum difference is of {diff_m2fix.max():.4f}. '
       f'Therefore the approximation only deviates by {diff_m2fix.max()*100:.0f}%,'
       ' and it occurs at $M_1$=$M_2^{th}$')


The maximum difference is of 0.0918. Therefore the approximation only
deviates by 9%, and it occurs at :math:`M_1`\ =\ :math:`M_2^{th}`