Matching catalogs based on proximity (simple)¶

Matching two catalogs based on proximity based on a configuration dictionary

%load_ext autoreload
%autoreload 2

ClCatalogs¶

Given some input data

import numpy as np
from astropy.table import Table
input1 = Table({
    'ID': [f'CL{i}' for i in range(5)],
    'RA': [0.0, 0.0001, 0.00011, 25, 20],
    'DEC': [0.0, 0.0, 0.0, 0.0, 0.0],
    'Z': [0.2, 0.3, 0.25, 0.4, 0.35],
    'MASS': [10**13.5, 10**13.4, 10**13.3, 10**13.8, 10**14],
    'RADIUS_ARCMIN': [1.0, 1.0, 1.0, 1.0, 1.0],
})
input2 = Table({
    'ID': ['CL0', 'CL1', 'CL2', 'CL3'],
    'RA': [0.0, 0.0001, 0.00011, 25],
    'DEC': [0.0, 0, 0, 0],
    'Z': [0.3, 0.2, 0.25, 0.4],
    'MASS': [10**13.3, 10**13.4, 10**13.5, 10**13.8],
    'RADIUS_ARCMIN': [1.0, 1.0, 1.0, 1.0],
})
display(input1)
display(input2)

Table length=5

ID	RA	DEC	Z	MASS	RADIUS_ARCMIN
str3	float64	float64	float64	float64	float64
CL0	0.0	0.0	0.2	31622776601683.793	1.0
CL1	0.0001	0.0	0.3	25118864315095.82	1.0
CL2	0.00011	0.0	0.25	19952623149688.83	1.0
CL3	25.0	0.0	0.4	63095734448019.43	1.0
CL4	20.0	0.0	0.35	100000000000000.0	1.0

Table length=4

ID	RA	DEC	Z	MASS	RADIUS_ARCMIN
str3	float64	float64	float64	float64	float64
CL0	0.0	0.0	0.3	19952623149688.83	1.0
CL1	0.0001	0.0	0.2	25118864315095.82	1.0
CL2	0.00011	0.0	0.25	31622776601683.793	1.0
CL3	25.0	0.0	0.4	63095734448019.43	1.0

Create two ClCatalog objects, they have the same properties of astropy tables with additional functionality. For the proximity matching, the main columns to be included are: - id - if not included, one will be assigned -ra(in degrees) - necessary -dec(in degrees) - necessary -z- necessary if used as matching criteria or for angular to physical convertion -mass(or mass proxy) - necessary if used as preference criteria for unique matches -radius- necessary if used as a criteria of matching (also requiresradius_unit` to be passed)

from clevar.catalog import ClCatalog
c1 = ClCatalog('Cat1', id=input1['ID'], ra=input1['RA'], dec=input1['DEC'], z=input1['Z'], mass=input1['MASS'])
c2 = ClCatalog('Cat2', id=input2['ID'], ra=input2['RA'], dec=input2['DEC'], z=input2['Z'], mass=input2['MASS'])
# Format for nice display
for c in ('ra', 'dec', 'z'):
    c1[c].info.format = '.2f'
    c2[c].info.format = '.2f'
for c in ('mass',):
    c1[c].info.format = '.2e'
    c2[c].info.format = '.2e'
display(c1)
display(c2)

Cat1
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None

ClData length=5

id	ra	dec	z	mass	mt_self	mt_other	mt_multi_self	mt_multi_other
str3	float64	float64	float64	float64	object	object	object	object
CL0	0.00	0.00	0.20	3.16e+13	None	None	[]	[]
CL1	0.00	0.00	0.30	2.51e+13	None	None	[]	[]
CL2	0.00	0.00	0.25	2.00e+13	None	None	[]	[]
CL3	25.00	0.00	0.40	6.31e+13	None	None	[]	[]
CL4	20.00	0.00	0.35	1.00e+14	None	None	[]	[]

Cat2
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None

ClData length=4

id	ra	dec	z	mass	mt_self	mt_other	mt_multi_self	mt_multi_other
str3	float64	float64	float64	float64	object	object	object	object
CL0	0.00	0.00	0.30	2.00e+13	None	None	[]	[]
CL1	0.00	0.00	0.20	2.51e+13	None	None	[]	[]
CL2	0.00	0.00	0.25	3.16e+13	None	None	[]	[]
CL3	25.00	0.00	0.40	6.31e+13	None	None	[]	[]

The ClCatalog object can also be read directly from a file, for details, see catalogs.ipynb.

Matching¶

Import the ProximityMatch and create a object for matching

from clevar.match import ProximityMatch
mt = ProximityMatch()

Prepare the configuration. The main values are:

type: Type of matching to be considered. Can be a simple match of ClCatalog1->ClCatalog2 (cat1), ClCatalog2->ClCatalog1 (cat2) or cross matching.
which_radius: Given a pair of clusters, which radius will be used for the matching.
preference: In cases where there are multiple matched, how the best candidate will be chosen.
verbose: Print result for individual matches (default=True).

We also need to provide some specific configuration for each catalog with:

delta_z: Defines redshift window for matching. The possible values are:
- 'cat': uses redshift properties of the catalog
- 'spline.filename': interpolates data in 'filename' assuming (z, zmin, zmax) format
- float: uses delta_z*(1+z)
- None: does not use z
match_radius: Radius of the catalog to be used in the matching. If 'cat' uses the radius in the catalog, else must be in format 'value unit'. (ex: '1 arcsec', '1 Mpc')

In this case, because one of the configuraion radius has physical units, we need a cosmology object to convert it to angular size (this is done internally).

match_config = {
    'type': 'cross', # options are cross, cat1, cat2
    'which_radius': 'max', # Case of radius to be used, can be: cat1, cat2, min, max
    'preference': 'angular_proximity', # options are more_massive, angular_proximity or redshift_proximity
    'catalog1': {'delta_z':.2,
                'match_radius': '1 mpc'
                },
    'catalog2': {'delta_z':.2,
                'match_radius': '10 arcsec'
                }
}
from clevar.cosmology import AstroPyCosmology
cosmo = AstroPyCosmology()

Once the configuration is prepared, the whole process can be done with one call:

%%time
mt.match_from_config(c1, c2, match_config, cosmo=cosmo)

## ClCatalog 1
## Prep mt_cols
* zmin|zmax from config value
* ang radius from set scale

## ClCatalog 2
## Prep mt_cols
* zmin|zmax from config value
* ang radius from set scale

## Multiple match (catalog 1)
Finding candidates (Cat1)
* 4/5 objects matched.

## Multiple match (catalog 2)
Finding candidates (Cat2)
* 4/4 objects matched.

## Finding unique matches of catalog 1
Unique Matches (Cat1)
* 4/5 objects matched.

## Finding unique matches of catalog 2
Unique Matches (Cat2)
* 4/4 objects matched.
Cross Matches (Cat1)
* 4/5 objects matched.
Cross Matches (Cat2)
* 4/4 objects matched.
CPU times: user 97.2 ms, sys: 1.01 ms, total: 98.2 ms
Wall time: 96.7 ms

This will fill the matching columns in the catalogs: - mt_multi_self: Multiple matches found - mt_multi_other: Multiple matches found by the other catalog - mt_self: Best candidate found - mt_other: Best candidate found by the other catalog - mt_cross: Best candidate found in both directions

display(c1)
display(c2)

Cat1
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None

ClData length=5

										mt_input
id	ra	dec	z	mass	mt_self	mt_other	mt_multi_self	mt_multi_other	mt_cross	zmin	zmax	ang
str3	float64	float64	float64	float64	object	object	object	object	object	float64	float64	float64
CL0	0.00	0.00	0.20	3.16e+13	CL0	CL0	['CL0', 'CL1', 'CL2']	['CL0', 'CL1', 'CL2']	CL0	-0.04	0.44	0.08418388522320427
CL1	0.00	0.00	0.30	2.51e+13	CL1	CL1	['CL0', 'CL1', 'CL2']	['CL0', 'CL1', 'CL2']	CL1	0.04	0.56	0.062361611333396835
CL2	0.00	0.00	0.25	2.00e+13	CL2	CL2	['CL0', 'CL1', 'CL2']	['CL0', 'CL1', 'CL2']	CL2	0.00	0.50	0.0710414327593546
CL3	25.00	0.00	0.40	6.31e+13	CL3	CL3	['CL3']	['CL3']	CL3	0.12	0.68	0.05169945411341919
CL4	20.00	0.00	0.35	1.00e+14	None	None	[]	[]	None	0.08	0.62	0.05623291641697765

Cat2
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None

ClData length=4

										mt_input
id	ra	dec	z	mass	mt_self	mt_other	mt_multi_self	mt_multi_other	mt_cross	zmin	zmax	ang
str3	float64	float64	float64	float64	object	object	object	object	object	float64	float64	float64
CL0	0.00	0.00	0.30	2.00e+13	CL0	CL0	['CL0', 'CL1', 'CL2']	['CL0', 'CL1', 'CL2']	CL0	0.04	0.56	0.002777777777777778
CL1	0.00	0.00	0.20	2.51e+13	CL1	CL1	['CL0', 'CL1', 'CL2']	['CL0', 'CL1', 'CL2']	CL1	-0.04	0.44	0.002777777777777778
CL2	0.00	0.00	0.25	3.16e+13	CL2	CL2	['CL0', 'CL1', 'CL2']	['CL0', 'CL1', 'CL2']	CL2	0.00	0.50	0.002777777777777778
CL3	25.00	0.00	0.40	6.31e+13	CL3	CL3	['CL3']	['CL3']	CL3	0.12	0.68	0.002777777777777778

The steps of matching are stored in the catalogs and can be checked:

c1.show_mt_hist()

prep_cat_for_match(cat='Cat1', delta_z=0.2, match_radius='1 mpc', n_delta_z=1, n_match_radius=1,
                   cosmo='AstroPyCosmology(H0=70.0, Omega_dm0=0.25, Omega_b0=0.05, Omega_k0=0.0)')

prep_cat_for_match(cat='Cat2', delta_z=0.2, match_radius='10 arcsec', n_delta_z=1, n_match_radius=1,
                   cosmo='AstroPyCosmology(H0=70.0, Omega_dm0=0.25, Omega_b0=0.05, Omega_k0=0.0)')

multiple(cat1='Cat1', cat2='Cat2', radius_selection='max')

multiple(cat1='Cat2', cat2='Cat1', radius_selection='max')

unique(cat1='Cat1', cat2='Cat2', preference='angular_proximity')

unique(cat1='Cat2', cat2='Cat1', preference='angular_proximity')

Save and Load¶

The results of the matching can easily be saved and load using ClEvaR tools:

mt.save_matches(c1, c2, out_dir='temp', overwrite=True)

mt.load_matches(c1, c2, out_dir='temp')
display(c1)
display(c2)

Cat1
<< ClEvar used in matching: 0.13.2 >>
 * Total objects:    5
 * multiple (self):  4
 * multiple (other): 4
 * unique (self):    4
 * unique (other):   4
 * cross:            4

Cat2
<< ClEvar used in matching: 0.13.2 >>
 * Total objects:    4
 * multiple (self):  4
 * multiple (other): 4
 * unique (self):    4
 * unique (other):   4
 * cross:            4

Cat1
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None

ClData length=5

										mt_input
id	ra	dec	z	mass	mt_self	mt_other	mt_multi_self	mt_multi_other	mt_cross	zmin	zmax	ang
str3	float64	float64	float64	float64	object	object	object	object	object	float64	float64	float64
CL0	0.00	0.00	0.20	3.16e+13	CL0	CL0	['CL0', 'CL1', 'CL2']	['CL0', 'CL1', 'CL2']	CL0	-0.04	0.44	0.08418388522320427
CL1	0.00	0.00	0.30	2.51e+13	CL1	CL1	['CL0', 'CL1', 'CL2']	['CL0', 'CL1', 'CL2']	CL1	0.04	0.56	0.062361611333396835
CL2	0.00	0.00	0.25	2.00e+13	CL2	CL2	['CL0', 'CL1', 'CL2']	['CL0', 'CL1', 'CL2']	CL2	0.00	0.50	0.0710414327593546
CL3	25.00	0.00	0.40	6.31e+13	CL3	CL3	['CL3']	['CL3']	CL3	0.12	0.68	0.05169945411341919
CL4	20.00	0.00	0.35	1.00e+14	None	None	[]	[]	None	0.08	0.62	0.05623291641697765

Cat2
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None

ClData length=4

										mt_input
id	ra	dec	z	mass	mt_self	mt_other	mt_multi_self	mt_multi_other	mt_cross	zmin	zmax	ang
str3	float64	float64	float64	float64	object	object	object	object	object	float64	float64	float64
CL0	0.00	0.00	0.30	2.00e+13	CL0	CL0	['CL0', 'CL1', 'CL2']	['CL0', 'CL1', 'CL2']	CL0	0.04	0.56	0.002777777777777778
CL1	0.00	0.00	0.20	2.51e+13	CL1	CL1	['CL0', 'CL1', 'CL2']	['CL0', 'CL1', 'CL2']	CL1	-0.04	0.44	0.002777777777777778
CL2	0.00	0.00	0.25	3.16e+13	CL2	CL2	['CL0', 'CL1', 'CL2']	['CL0', 'CL1', 'CL2']	CL2	0.00	0.50	0.002777777777777778
CL3	25.00	0.00	0.40	6.31e+13	CL3	CL3	['CL3']	['CL3']	CL3	0.12	0.68	0.002777777777777778

Getting Matched Pairs¶

There is functionality inbuilt in clevar to plot some results of the matching, such as: - Recovery rates - Distances (anguar and redshift) of cluster centers - Scaling relations (mass, redshift, …) for those cases, check the match_metrics.ipynb and match_metrics_advanced.ipynb notebooks.

If those do not provide your needs, you can get directly the matched pairs of clusters:

from clevar.match import get_matched_pairs
mt1, mt2 = get_matched_pairs(c1, c2, 'cross')

These will be catalogs with the corresponding matched pairs:

import pylab as plt
plt.scatter(mt1['mass'], mt2['mass'])

<matplotlib.collections.PathCollection at 0x7f340013e3d0>

Outputing matched catalogs¶

To save the current catalogs, you can use the write inbuilt function:

c1.write('c1_temp.fits', overwrite=True)

This will allow you to save the catalog with its current labels and matching information.

Outputing matching information to original catalogs¶

If your input data came from initial files, clevar also provides functions create output files that combine all the information on them with the matching results.

To add the matching information to an input catalog, use:

from clevar.match import output_catalog_with_matching
output_catalog_with_matching('input_catalog.fits', 'output_catalog.fits', c1)

note: input_catalog.fits must have the same number of rows that c1.

To create a matched catalog containig all columns of both input catalogs, use:

from clevar.match import output_matched_catalog
output_matched_catalog('input_catalog1.fits', 'input_catalog2.fits',
    'output_catalog.fits', c1, c2, matching_type='cross')

where matching_type must be cross, cat1 or cat2.

note: input_catalog1.fits must have the same number of rows that c1 (and the same for c2).