Matching catalogs based on membership (simple) ============================================== Matching two catalogs based on membseship using a configuration dictionary .. code:: ipython3 %load_ext autoreload %autoreload 2 ClCatalogs ---------- Given some input data .. code:: ipython3 import numpy as np from astropy.table import Table input1 = Table({'ID': ['CL0a', 'CL1a', 'CL2a', 'CL3a', 'CL4a']}) input1['MASS'] = 1e14*np.arange(1, 6)*10 input2 = Table({'ID': ['CL0b', 'CL1b', 'CL2b', 'CL3b']}) input2['MASS'] = 1e14*np.arange(1, 5)*10 display(input1) display(input2) input1_mem = Table( {'ID':[ 'MEM0', 'MEM1', 'MEM2', 'MEM3', 'MEM4', 'MEM5', 'MEM6', 'MEM7', 'MEM8', 'MEM9', 'MEM10', 'MEM11', 'MEM12', 'MEM13', 'MEM14'], 'ID_CLUSTER': [ 'CL0a', 'CL0a', 'CL0a', 'CL0a', 'CL0a', 'CL1a', 'CL1a', 'CL1a', 'CL1a', 'CL2a', 'CL2a', 'CL2a', 'CL3a', 'CL3a', 'CL4a'], }) input2_mem = Table( {'ID':[ 'MEM0', 'MEM1', 'MEM2', 'MEM3', 'MEM4', 'MEM5', 'MEM6', 'MEM7', 'MEM8', 'MEM9', 'MEM10', 'MEM11', 'MEM12', 'MEM13'], 'ID_CLUSTER': [ 'CL3b', 'CL0b', 'CL0b', 'CL0b', 'CL0b', 'CL1b', 'CL1b', 'CL1b', 'CL1b', 'CL2b', 'CL2b', 'CL2b', 'CL3b', 'CL3b'], }) input1_mem['RA'] = np.arange(len(input1_mem))*10.0 input2_mem['RA'] = np.arange(len(input2_mem))*10.0 input1_mem['DEC'] = 0.0 input2_mem['DEC'] = 0.0 input1_mem['Z'] = 0.1 input2_mem['Z'] = 0.1 input1_mem['PMEM'] = 1.0 input2_mem['PMEM'] = 1.0 display(input1_mem) display(input2_mem) .. raw:: html

Table length=5

ID	MASS
str4	float64
CL0a	1000000000000000.0
CL1a	2000000000000000.0
CL2a	3000000000000000.0
CL3a	4000000000000000.0
CL4a	5000000000000000.0

.. raw:: html

Table length=4

ID	MASS
str4	float64
CL0b	1000000000000000.0
CL1b	2000000000000000.0
CL2b	3000000000000000.0
CL3b	4000000000000000.0

.. raw:: html

Table length=15

ID	ID_CLUSTER	RA	DEC	Z	PMEM
str5	str4	float64	float64	float64	float64
MEM0	CL0a	0.0	0.0	0.1	1.0
MEM1	CL0a	10.0	0.0	0.1	1.0
MEM2	CL0a	20.0	0.0	0.1	1.0
MEM3	CL0a	30.0	0.0	0.1	1.0
MEM4	CL0a	40.0	0.0	0.1	1.0
MEM5	CL1a	50.0	0.0	0.1	1.0
MEM6	CL1a	60.0	0.0	0.1	1.0
MEM7	CL1a	70.0	0.0	0.1	1.0
MEM8	CL1a	80.0	0.0	0.1	1.0
MEM9	CL2a	90.0	0.0	0.1	1.0
MEM10	CL2a	100.0	0.0	0.1	1.0
MEM11	CL2a	110.0	0.0	0.1	1.0
MEM12	CL3a	120.0	0.0	0.1	1.0
MEM13	CL3a	130.0	0.0	0.1	1.0
MEM14	CL4a	140.0	0.0	0.1	1.0

.. raw:: html

Table length=14

ID	ID_CLUSTER	RA	DEC	Z	PMEM
str5	str4	float64	float64	float64	float64
MEM0	CL3b	0.0	0.0	0.1	1.0
MEM1	CL0b	10.0	0.0	0.1	1.0
MEM2	CL0b	20.0	0.0	0.1	1.0
MEM3	CL0b	30.0	0.0	0.1	1.0
MEM4	CL0b	40.0	0.0	0.1	1.0
MEM5	CL1b	50.0	0.0	0.1	1.0
MEM6	CL1b	60.0	0.0	0.1	1.0
MEM7	CL1b	70.0	0.0	0.1	1.0
MEM8	CL1b	80.0	0.0	0.1	1.0
MEM9	CL2b	90.0	0.0	0.1	1.0
MEM10	CL2b	100.0	0.0	0.1	1.0
MEM11	CL2b	110.0	0.0	0.1	1.0
MEM12	CL3b	120.0	0.0	0.1	1.0
MEM13	CL3b	130.0	0.0	0.1	1.0

Create two ``ClCatalog`` objects, they have the same properties of ``astropy`` tables with additional functionality. For the membership matching, the main columns to be included are: - ``id`` - must correspond to ``id_cluster`` in the cluster member catalog. - ``mass`` (or mass proxy) - necessary for proxity matching if ``shared_member_fraction`` used as preference criteria for unique matches, default use. All of the columns can be added when creating the ``ClCatalog`` object passing them as keys: :: cat = ClCatalog('Cat', ra=[0, 1]) or can also be added afterwards: :: cat = ClCatalog('Cat') cat['ra'] = [0, 1] .. code:: ipython3 from clevar.catalog import ClCatalog c1 = ClCatalog('Cat1', id=input1['ID'], mass=input1['MASS']) c2 = ClCatalog('Cat2', id=input2['ID'], mass=input2['MASS']) # Format for nice display c1['mass'].info.format = '.2e' c2['mass'].info.format = '.2e' display(c1) display(c2) .. raw:: html Cat1
tags: id(id), mass(mass)
Radius unit: None

ClData length=5

id	mass	mt_self	mt_other	mt_multi_self	mt_multi_other
str4	float64	object	object	object	object
CL0a	1.00e+15	None	None	[]	[]
CL1a	2.00e+15	None	None	[]	[]
CL2a	3.00e+15	None	None	[]	[]
CL3a	4.00e+15	None	None	[]	[]
CL4a	5.00e+15	None	None	[]	[]

.. raw:: html Cat2
tags: id(id), mass(mass)
Radius unit: None

ClData length=4

id	mass	mt_self	mt_other	mt_multi_self	mt_multi_other
str4	float64	object	object	object	object
CL0b	1.00e+15	None	None	[]	[]
CL1b	2.00e+15	None	None	[]	[]
CL2b	3.00e+15	None	None	[]	[]
CL3b	4.00e+15	None	None	[]	[]

The members can be added to the cluster object using the ``add_members`` function. It has a similar instanciating format of a ``ClCatalog`` object, where the columns are added by keyword arguments (the key ``id_cluster`` is always necessary and must correspond to ``id`` in the main cluster catalog). .. code:: ipython3 c1.add_members(id=input1_mem['ID'], id_cluster=input1_mem['ID_CLUSTER'], ra=input1_mem['RA'], dec=input1_mem['DEC'], pmem=input1_mem['PMEM']) c2.add_members(id=input2_mem['ID'], id_cluster=input2_mem['ID_CLUSTER'], ra=input2_mem['RA'], dec=input2_mem['DEC'], pmem=input2_mem['PMEM']) display(c1.members) display(c2.members) .. raw:: html members
tags: id(id), id_cluster(id_cluster), ra(ra), dec(dec), pmem(pmem)

ClData length=15

id	id_cluster	ra	dec	pmem	ind_cl
str5	str4	float64	float64	float64	int64
MEM0	CL0a	0.0	0.0	1.0	0
MEM1	CL0a	10.0	0.0	1.0	0
MEM2	CL0a	20.0	0.0	1.0	0
MEM3	CL0a	30.0	0.0	1.0	0
MEM4	CL0a	40.0	0.0	1.0	0
MEM5	CL1a	50.0	0.0	1.0	1
MEM6	CL1a	60.0	0.0	1.0	1
MEM7	CL1a	70.0	0.0	1.0	1
MEM8	CL1a	80.0	0.0	1.0	1
MEM9	CL2a	90.0	0.0	1.0	2
MEM10	CL2a	100.0	0.0	1.0	2
MEM11	CL2a	110.0	0.0	1.0	2
MEM12	CL3a	120.0	0.0	1.0	3
MEM13	CL3a	130.0	0.0	1.0	3
MEM14	CL4a	140.0	0.0	1.0	4

.. raw:: html members
tags: id(id), id_cluster(id_cluster), ra(ra), dec(dec), pmem(pmem)

ClData length=14

id	id_cluster	ra	dec	pmem	ind_cl
str5	str4	float64	float64	float64	int64
MEM0	CL3b	0.0	0.0	1.0	3
MEM1	CL0b	10.0	0.0	1.0	0
MEM2	CL0b	20.0	0.0	1.0	0
MEM3	CL0b	30.0	0.0	1.0	0
MEM4	CL0b	40.0	0.0	1.0	0
MEM5	CL1b	50.0	0.0	1.0	1
MEM6	CL1b	60.0	0.0	1.0	1
MEM7	CL1b	70.0	0.0	1.0	1
MEM8	CL1b	80.0	0.0	1.0	1
MEM9	CL2b	90.0	0.0	1.0	2
MEM10	CL2b	100.0	0.0	1.0	2
MEM11	CL2b	110.0	0.0	1.0	2
MEM12	CL3b	120.0	0.0	1.0	3
MEM13	CL3b	130.0	0.0	1.0	3

The catalogs can also be read directly from files, for more details see catalogs.ipynb. Matching -------- Import the ``MembershipMatch`` and create a object for matching .. code:: ipython3 from clevar.match import MembershipMatch mt = MembershipMatch() Prepare the configuration. The main values are: - ``type``: Type of matching to be considered. Can be a simple match of ClCatalog1->ClCatalog2 (``cat1``), ClCatalog2->ClCatalog1 (``cat2``) or cross matching. - ``preference``: In cases where there are multiple matched, how the best candidate will be chosen. - ``minimum_share_fraction1``: Minimum share fraction of catalog 1 to consider in matches (default=\ ``0``). - ``minimum_share_fraction2``: Minimum share fraction of catalog 2 to consider in matches (default=\ ``0``). - ``match_members``: Match the members catalogs (default=\ ``True``), necessary if not already made. - ``match_members_kwargs``: dictionary of arguments to match members, needed if ``match_members=True``. Keys are: - ``method``\ (str): Method for matching. Options are ``id`` or ``angular_distance``. - ``radius``\ (str, None): For ``method='angular_distance'``. Radius for matching, with format ``'value unit'`` (ex: ``1 arcsec``, ``1 Mpc``). - ``cosmo``\ (clevar.Cosmology, None): For ``method='angular_distance'``. Cosmology object for when radius has physical units. - ``match_members_save``: saves file with matched members (default=\ ``False``). - ``match_members_load``: load matched members (default=\ ``False``), if ``True`` skips matching (and save) of members. - ``match_members_file``: file to save matching of members, needed if ``match_members_save`` or ``match_members_load`` is ``True``. - ``shared_members_fill``: Adds shared members dicts and nmem to mt_input in catalogs (default=\ ``True``), necessary if not already made. - ``shared_members_save``: saves files with shared members (default=\ ``False``). - ``shared_members_load``: load files with shared members (default=\ ``False``), if ``True`` skips matching (and save) of members and fill (and save) of shared members. - ``shared_members_file``: Prefix of file names to save shared members, needed if ``shared_members_save`` or ``shared_members_load`` is ``True``. - ``verbose``: Print result for individual matches (default=\ ``True``). .. code:: ipython3 match_config = { 'type': 'cross', # options are cross, cat1, cat2 'preference': 'shared_member_fraction', # other options are more_massive, angular_proximity or redshift_proximity 'minimum_share_fraction': 0, 'match_members_kwargs': {'method':'id'}, } Once the configuration is prepared, the whole process can be done with one call: .. code:: ipython3 %%time mt.match_from_config(c1, c2, match_config) .. parsed-literal:: 28 members were matched. ## Multiple match (catalog 1) Finding candidates (Cat1) * 4/5 objects matched. ## Multiple match (catalog 2) Finding candidates (Cat2) * 4/4 objects matched. ## Finding unique matches of catalog 1 Unique Matches (Cat1) * 4/5 objects matched. ## Finding unique matches of catalog 2 Unique Matches (Cat2) * 4/4 objects matched. Cross Matches (Cat1) * 4/5 objects matched. Cross Matches (Cat2) * 4/4 objects matched. CPU times: user 8.34 ms, sys: 438 µs, total: 8.78 ms Wall time: 8.12 ms This will fill the matching columns in the catalogs: - ``mt_multi_self``: Multiple matches found - ``mt_multi_other``: Multiple matches found by the other catalog - ``mt_self``: Best candidate found - ``mt_other``: Best candidate found by the other catalog - ``mt_frac_self``: Fraction of shared members with the best candidate found - ``mt_frac_other``: Fraction of shared members by the best candidate found by the other catalog, relative to the other catalog - ``mt_cross``: Best candidate found in both directions If ``pmem`` is present in the members catalogs, the shared fractions are computed by: .. raw:: html

.. raw:: html :math:`\frac{\sum_{shared\;members}Pmem_i}{\sum_{cluster\;members}Pmem_i}` .. raw:: html .. raw:: html

.. code:: ipython3 display(c1) display(c2) .. raw:: html Cat1
tags: id(id), mass(mass)
Radius unit: None

ClData length=5

									mt_input
id	mass	mt_self	mt_other	mt_multi_self	mt_multi_other	mt_frac_self	mt_frac_other	mt_cross	share_mems	nmem
str4	float64	object	object	object	object	float64	float64	object	object	float64
CL0a	1.00e+15	CL0b	CL0b	['CL3b', 'CL0b']	['CL3b', 'CL0b']	0.8	1.0	CL0b	{'CL3b': 1.0, 'CL0b': 4.0}	5.0
CL1a	2.00e+15	CL1b	CL1b	['CL1b']	['CL1b']	1.0	1.0	CL1b	{'CL1b': 4.0}	4.0
CL2a	3.00e+15	CL2b	CL2b	['CL2b']	['CL2b']	1.0	1.0	CL2b	{'CL2b': 3.0}	3.0
CL3a	4.00e+15	CL3b	CL3b	['CL3b']	['CL3b']	1.0	0.6666666666666666	CL3b	{'CL3b': 2.0}	2.0
CL4a	5.00e+15	None	None	[]	[]	0.0	0.0	None	{}	1.0

.. raw:: html Cat2
tags: id(id), mass(mass)
Radius unit: None

ClData length=4

									mt_input
id	mass	mt_self	mt_other	mt_multi_self	mt_multi_other	mt_frac_other	mt_frac_self	mt_cross	share_mems	nmem
str4	float64	object	object	object	object	float64	float64	object	object	float64
CL0b	1.00e+15	CL0a	CL0a	['CL0a']	['CL0a']	0.8	1.0	CL0a	{'CL0a': 4.0}	4.0
CL1b	2.00e+15	CL1a	CL1a	['CL1a']	['CL1a']	1.0	1.0	CL1a	{'CL1a': 4.0}	4.0
CL2b	3.00e+15	CL2a	CL2a	['CL2a']	['CL2a']	1.0	1.0	CL2a	{'CL2a': 3.0}	3.0
CL3b	4.00e+15	CL3a	CL3a	['CL3a', 'CL0a']	['CL3a', 'CL0a']	1.0	0.6666666666666666	CL3a	{'CL3a': 2.0, 'CL0a': 1.0}	3.0

The steps of matching are stored in the catalogs and can be checked: .. code:: ipython3 c1.show_mt_hist(50) .. parsed-literal:: multiple(cat1='Cat1', cat2='Cat2') multiple(cat1='Cat2', cat2='Cat1') unique(cat1='Cat1', cat2='Cat2', preference='shared_member_fraction', minimum_share_fraction=0) unique(cat1='Cat2', cat2='Cat1', preference='shared_member_fraction', minimum_share_fraction=0) Save and Load ------------- The results of the matching can easily be saved and load using ``ClEvaR`` tools: .. code:: ipython3 mt.save_matches(c1, c2, out_dir='temp', overwrite=True) .. code:: ipython3 mt.load_matches(c1, c2, out_dir='temp') display(c1) display(c2) .. parsed-literal:: Cat1 << ClEvar used in matching: 0.13.2 >> * Total objects: 5 * multiple (self): 4 * multiple (other): 4 * unique (self): 4 * unique (other): 4 * cross: 4 Cat2 << ClEvar used in matching: 0.13.2 >> * Total objects: 4 * multiple (self): 4 * multiple (other): 4 * unique (self): 4 * unique (other): 4 * cross: 4 .. raw:: html Cat1
tags: id(id), mass(mass)
Radius unit: None

ClData length=5

									mt_input
id	mass	mt_self	mt_other	mt_multi_self	mt_multi_other	mt_frac_self	mt_frac_other	mt_cross	share_mems	nmem
str4	float64	object	object	object	object	float64	float64	object	object	float64
CL0a	1.00e+15	CL0b	CL0b	['CL3b', 'CL0b']	['CL3b', 'CL0b']	0.8	1.0	CL0b	{'CL3b': 1.0, 'CL0b': 4.0}	5.0
CL1a	2.00e+15	CL1b	CL1b	['CL1b']	['CL1b']	1.0	1.0	CL1b	{'CL1b': 4.0}	4.0
CL2a	3.00e+15	CL2b	CL2b	['CL2b']	['CL2b']	1.0	1.0	CL2b	{'CL2b': 3.0}	3.0
CL3a	4.00e+15	CL3b	CL3b	['CL3b']	['CL3b']	1.0	0.6666666666666666	CL3b	{'CL3b': 2.0}	2.0
CL4a	5.00e+15	None	None	[]	[]	0.0	0.0	None	{}	1.0

.. raw:: html Cat2
tags: id(id), mass(mass)
Radius unit: None

ClData length=4

									mt_input
id	mass	mt_self	mt_other	mt_multi_self	mt_multi_other	mt_frac_other	mt_frac_self	mt_cross	share_mems	nmem
str4	float64	object	object	object	object	float64	float64	object	object	float64
CL0b	1.00e+15	CL0a	CL0a	['CL0a']	['CL0a']	0.8	1.0	CL0a	{'CL0a': 4.0}	4.0
CL1b	2.00e+15	CL1a	CL1a	['CL1a']	['CL1a']	1.0	1.0	CL1a	{'CL1a': 4.0}	4.0
CL2b	3.00e+15	CL2a	CL2a	['CL2a']	['CL2a']	1.0	1.0	CL2a	{'CL2a': 3.0}	3.0
CL3b	4.00e+15	CL3a	CL3a	['CL3a', 'CL0a']	['CL3a', 'CL0a']	1.0	0.6666666666666666	CL3a	{'CL3a': 2.0, 'CL0a': 1.0}	3.0

Getting Matched Pairs --------------------- There is functionality inbuilt in ``clevar`` to plot some results of the matching, such as: - Recovery rates - Distances (anguar and redshift) of cluster centers - Scaling relations (mass, redshift, …) for those cases, check the match_metrics.ipynb and match_metrics_advanced.ipynb notebooks. If those do not provide your needs, you can get directly the matched pairs of clusters: .. code:: ipython3 from clevar.match import get_matched_pairs mt1, mt2 = get_matched_pairs(c1, c2, 'cross') These will be catalogs with the corresponding matched pairs: .. code:: ipython3 import pylab as plt plt.scatter(mt1['mass'], mt2['mass']) .. parsed-literal:: .. image:: membership_matching_files/membership_matching_25_1.png Members of matched pairs ~~~~~~~~~~~~~~~~~~~~~~~~ The members also carry the information on the matched clusters. The column ``match`` shows to which clusters of the other catalog this member also belongs. The column ``in_mt_sample`` says if those clusters are presented in the matched sample: .. code:: ipython3 mt1.members .. raw:: html members
tags: id(id), id_cluster(id_cluster), ra(ra), dec(dec), pmem(pmem)

ClData length=14

id	id_cluster	ra	dec	pmem	ind_cl	match	in_mt_sample
str5	str4	float64	float64	float64	int64	object	bool
MEM0	CL0a	0.0	0.0	1.0	0	['CL3b']	True
MEM1	CL0a	10.0	0.0	1.0	0	['CL0b']	True
MEM2	CL0a	20.0	0.0	1.0	0	['CL0b']	True
MEM3	CL0a	30.0	0.0	1.0	0	['CL0b']	True
MEM4	CL0a	40.0	0.0	1.0	0	['CL0b']	True
MEM5	CL1a	50.0	0.0	1.0	1	['CL1b']	True
MEM6	CL1a	60.0	0.0	1.0	1	['CL1b']	True
MEM7	CL1a	70.0	0.0	1.0	1	['CL1b']	True
MEM8	CL1a	80.0	0.0	1.0	1	['CL1b']	True
MEM9	CL2a	90.0	0.0	1.0	2	['CL2b']	True
MEM10	CL2a	100.0	0.0	1.0	2	['CL2b']	True
MEM11	CL2a	110.0	0.0	1.0	2	['CL2b']	True
MEM12	CL3a	120.0	0.0	1.0	3	['CL3b']	True
MEM13	CL3a	130.0	0.0	1.0	3	['CL3b']	True

Outputing matched catalogs -------------------------- To save the current catalogs, you can use the ``write`` inbuilt function: .. code:: ipython3 c1.write('c1_temp.fits', overwrite=True) This will allow you to save the catalog with its current labels and matching information. Outputing matching information to original catalogs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Assuming your input data came from initial files, ``clevar`` also provides functions create output files that combine all the information on them with the matching results. To add the matching information to an input catalog, use: :: from clevar.match import output_catalog_with_matching output_catalog_with_matching('input_catalog.fits', 'output_catalog.fits', c1) - note: ``input_catalog.fits`` must have the same number of rows that ``c1``. To create a matched catalog containig all columns of both input catalogs, use: :: from clevar.match import output_matched_catalog output_matched_catalog('input_catalog1.fits', 'input_catalog2.fits', 'output_catalog.fits', c1, c2, matching_type='cross') where ``matching_type`` must be ``cross``, ``cat1`` or ``cat2``. - note: ``input_catalog1.fits`` must have the same number of rows that ``c1`` (and the same for ``c2``).