Matching catalogs based on proximity (detailed)¶
Here we show the specific steps of matching two catalogs based on proximity
Table of Contents
1 ClCatalogs
2 Matching
2.1 Prepare the catalogs
2.2 Multiple matching
2.3 Unique matching
2.4 Cross matching
3 Save and Load
4 Getting Matched Pairs
5 Outputing matched catalogs
5.1 Outputing matching information to original catalogs
%load_ext autoreload
%autoreload 2
ClCatalogs¶
Given some input data
import numpy as np
from astropy.table import Table
input1 = Table({
'ID': [f'CL{i}' for i in range(5)],
'RA': [0.0, 0.0001, 0.00011, 25, 20],
'DEC': [0.0, 0.0, 0.0, 0.0, 0.0],
'Z': [0.2, 0.3, 0.25, 0.4, 0.35],
'MASS': [10**13.5, 10**13.4, 10**13.3, 10**13.8, 10**14],
'RADIUS_ARCMIN': [1.0, 1.0, 1.0, 1.0, 1.0],
})
input2 = Table({
'ID': ['CL0', 'CL1', 'CL2', 'CL3'],
'RA': [0.0, 0.0001, 0.00011, 25],
'DEC': [0.0, 0, 0, 0],
'Z': [0.3, 0.2, 0.25, 0.4],
'MASS': [10**13.3, 10**13.4, 10**13.5, 10**13.8],
'RADIUS_ARCMIN': [1.0, 1.0, 1.0, 1.0],
})
display(input1)
display(input2)
| ID | RA | DEC | Z | MASS | RADIUS_ARCMIN |
|---|---|---|---|---|---|
| str3 | float64 | float64 | float64 | float64 | float64 |
| CL0 | 0.0 | 0.0 | 0.2 | 31622776601683.793 | 1.0 |
| CL1 | 0.0001 | 0.0 | 0.3 | 25118864315095.82 | 1.0 |
| CL2 | 0.00011 | 0.0 | 0.25 | 19952623149688.83 | 1.0 |
| CL3 | 25.0 | 0.0 | 0.4 | 63095734448019.43 | 1.0 |
| CL4 | 20.0 | 0.0 | 0.35 | 100000000000000.0 | 1.0 |
| ID | RA | DEC | Z | MASS | RADIUS_ARCMIN |
|---|---|---|---|---|---|
| str3 | float64 | float64 | float64 | float64 | float64 |
| CL0 | 0.0 | 0.0 | 0.3 | 19952623149688.83 | 1.0 |
| CL1 | 0.0001 | 0.0 | 0.2 | 25118864315095.82 | 1.0 |
| CL2 | 0.00011 | 0.0 | 0.25 | 31622776601683.793 | 1.0 |
| CL3 | 25.0 | 0.0 | 0.4 | 63095734448019.43 | 1.0 |
Create two ClCatalog objects, they have the same properties of
astropy tables with additional functionality. For the proximity
matching, the main columns to be included are: -
id - if not included, one will be assigned -ra(in degrees) - necessary -dec(in degrees) - necessary -z- necessary if used as matching criteria or for angular to physical convertion -mass(or mass proxy) - necessary if used as preference criteria for unique matches -radius- necessary if used as a criteria of matching (also requiresradius_unit`
to be passed)
from clevar.catalog import ClCatalog
c1 = ClCatalog('Cat1', id=input1['ID'], ra=input1['RA'], dec=input1['DEC'], z=input1['Z'], mass=input1['MASS'])
c2 = ClCatalog('Cat2', id=input2['ID'], ra=input2['RA'], dec=input2['DEC'], z=input2['Z'], mass=input2['MASS'])
# Format for nice display
for c in ('ra', 'dec', 'z'):
c1[c].info.format = '.2f'
c2[c].info.format = '.2f'
for c in ('mass',):
c1[c].info.format = '.2e'
c2[c].info.format = '.2e'
display(c1)
display(c2)
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None
| id | ra | dec | z | mass | mt_self | mt_other | mt_multi_self | mt_multi_other |
|---|---|---|---|---|---|---|---|---|
| str3 | float64 | float64 | float64 | float64 | object | object | object | object |
| CL0 | 0.00 | 0.00 | 0.20 | 3.16e+13 | None | None | [] | [] |
| CL1 | 0.00 | 0.00 | 0.30 | 2.51e+13 | None | None | [] | [] |
| CL2 | 0.00 | 0.00 | 0.25 | 2.00e+13 | None | None | [] | [] |
| CL3 | 25.00 | 0.00 | 0.40 | 6.31e+13 | None | None | [] | [] |
| CL4 | 20.00 | 0.00 | 0.35 | 1.00e+14 | None | None | [] | [] |
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None
| id | ra | dec | z | mass | mt_self | mt_other | mt_multi_self | mt_multi_other |
|---|---|---|---|---|---|---|---|---|
| str3 | float64 | float64 | float64 | float64 | object | object | object | object |
| CL0 | 0.00 | 0.00 | 0.30 | 2.00e+13 | None | None | [] | [] |
| CL1 | 0.00 | 0.00 | 0.20 | 2.51e+13 | None | None | [] | [] |
| CL2 | 0.00 | 0.00 | 0.25 | 3.16e+13 | None | None | [] | [] |
| CL3 | 25.00 | 0.00 | 0.40 | 6.31e+13 | None | None | [] | [] |
The ClCatalog object can also be read directly from a file, for
details, see catalogs.ipynb.
Matching¶
Import the ProximityMatch and create a object for matching
from clevar.match import ProximityMatch
mt = ProximityMatch()
Prepare the catalogs¶
The first step is to prepare each catalog with the matching configuration:
delta_z: Defines redshift window for matching. The possible values are:'cat': uses redshift properties of the catalog'spline.filename': interpolates data in'filename'assuming (z, zmin, zmax) formatfloat: usesdelta_z*(1+z)None: does not use z
match_radius: Radius of the catalog to be used in the matching. If'cat'uses the radius in the catalog, else must be in format'value unit'. (ex:'1 arcsec','1 Mpc')
In this case, because one of the configuraion radius has physical units,
we also need a cosmology (cosmo) object to convert it to angular
size (this is done internally).
from clevar.cosmology import AstroPyCosmology
mt_config1 = {'delta_z':.2,
'match_radius': '1 mpc',
'cosmo':AstroPyCosmology()}
mt_config2 = {'delta_z':.2,
'match_radius': '1 arcsec'}
mt.prep_cat_for_match(c1, **mt_config1)
mt.prep_cat_for_match(c2, **mt_config2)
## Prep mt_cols
* zmin|zmax from config value
* ang radius from set scale
## Prep mt_cols
* zmin|zmax from config value
* ang radius from set scale
This will add values to the mt_input attribute of the catalogs:
display(c1.mt_input)
display(c2.mt_input)
| zmin | zmax | ang |
|---|---|---|
| float64 | float64 | float64 |
| -0.04 | 0.44 | 0.08418388522320427 |
| 0.04 | 0.56 | 0.062361611333396835 |
| 0.00 | 0.50 | 0.0710414327593546 |
| 0.12 | 0.68 | 0.05169945411341919 |
| 0.08 | 0.62 | 0.05623291641697765 |
| zmin | zmax | ang |
|---|---|---|
| float64 | float64 | float64 |
| 0.04 | 0.56 | 0.0002777777777777778 |
| -0.04 | 0.44 | 0.0002777777777777778 |
| 0.00 | 0.50 | 0.0002777777777777778 |
| 0.12 | 0.68 | 0.0002777777777777778 |
Multiple matching¶
The next step is to match the catalogs and store all candidates that
pass the matching criteria. You can also pass the argument: -
radius_selection: Given a pair of clusters, which radius will be
used for the matching.
%%time
mt.multiple(c1, c2)
mt.multiple(c2, c1)
Finding candidates (Cat1)
* 4/5 objects matched.
Finding candidates (Cat2)
* 4/4 objects matched.
CPU times: user 28.5 ms, sys: 539 µs, total: 29.1 ms
Wall time: 28.4 ms
This will fill the mt_multi_self and mt_multi_other columns:
display(c1)
display(c2)
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None
| mt_input | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| id | ra | dec | z | mass | mt_self | mt_other | mt_multi_self | mt_multi_other | zmin | zmax | ang |
| str3 | float64 | float64 | float64 | float64 | object | object | object | object | float64 | float64 | float64 |
| CL0 | 0.00 | 0.00 | 0.20 | 3.16e+13 | None | None | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | -0.04 | 0.44 | 0.08418388522320427 |
| CL1 | 0.00 | 0.00 | 0.30 | 2.51e+13 | None | None | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | 0.04 | 0.56 | 0.062361611333396835 |
| CL2 | 0.00 | 0.00 | 0.25 | 2.00e+13 | None | None | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | 0.00 | 0.50 | 0.0710414327593546 |
| CL3 | 25.00 | 0.00 | 0.40 | 6.31e+13 | None | None | ['CL3'] | ['CL3'] | 0.12 | 0.68 | 0.05169945411341919 |
| CL4 | 20.00 | 0.00 | 0.35 | 1.00e+14 | None | None | [] | [] | 0.08 | 0.62 | 0.05623291641697765 |
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None
| mt_input | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| id | ra | dec | z | mass | mt_self | mt_other | mt_multi_self | mt_multi_other | zmin | zmax | ang |
| str3 | float64 | float64 | float64 | float64 | object | object | object | object | float64 | float64 | float64 |
| CL0 | 0.00 | 0.00 | 0.30 | 2.00e+13 | None | None | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | 0.04 | 0.56 | 0.0002777777777777778 |
| CL1 | 0.00 | 0.00 | 0.20 | 2.51e+13 | None | None | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | -0.04 | 0.44 | 0.0002777777777777778 |
| CL2 | 0.00 | 0.00 | 0.25 | 3.16e+13 | None | None | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | 0.00 | 0.50 | 0.0002777777777777778 |
| CL3 | 25.00 | 0.00 | 0.40 | 6.31e+13 | None | None | ['CL3'] | ['CL3'] | 0.12 | 0.68 | 0.0002777777777777778 |
Unique matching¶
Once all candidates are stored in each catalog, we can find the best
candidates. You can also pass the argument: - preference: In cases
where there are multiple matched, how the best candidate will be chosen.
%%time
mt.unique(c1, c2, preference='angular_proximity')
mt.unique(c2, c1, preference='angular_proximity')
Unique Matches (Cat1)
* 4/5 objects matched.
Unique Matches (Cat2)
* 4/4 objects matched.
CPU times: user 57.6 ms, sys: 5.88 ms, total: 63.5 ms
Wall time: 62 ms
This will fill the mt_self and mt_other columns:
display(c1)
display(c2)
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None
| mt_input | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| id | ra | dec | z | mass | mt_self | mt_other | mt_multi_self | mt_multi_other | zmin | zmax | ang |
| str3 | float64 | float64 | float64 | float64 | object | object | object | object | float64 | float64 | float64 |
| CL0 | 0.00 | 0.00 | 0.20 | 3.16e+13 | CL0 | CL0 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | -0.04 | 0.44 | 0.08418388522320427 |
| CL1 | 0.00 | 0.00 | 0.30 | 2.51e+13 | CL1 | CL1 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | 0.04 | 0.56 | 0.062361611333396835 |
| CL2 | 0.00 | 0.00 | 0.25 | 2.00e+13 | CL2 | CL2 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | 0.00 | 0.50 | 0.0710414327593546 |
| CL3 | 25.00 | 0.00 | 0.40 | 6.31e+13 | CL3 | CL3 | ['CL3'] | ['CL3'] | 0.12 | 0.68 | 0.05169945411341919 |
| CL4 | 20.00 | 0.00 | 0.35 | 1.00e+14 | None | None | [] | [] | 0.08 | 0.62 | 0.05623291641697765 |
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None
| mt_input | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| id | ra | dec | z | mass | mt_self | mt_other | mt_multi_self | mt_multi_other | zmin | zmax | ang |
| str3 | float64 | float64 | float64 | float64 | object | object | object | object | float64 | float64 | float64 |
| CL0 | 0.00 | 0.00 | 0.30 | 2.00e+13 | CL0 | CL0 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | 0.04 | 0.56 | 0.0002777777777777778 |
| CL1 | 0.00 | 0.00 | 0.20 | 2.51e+13 | CL1 | CL1 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | -0.04 | 0.44 | 0.0002777777777777778 |
| CL2 | 0.00 | 0.00 | 0.25 | 3.16e+13 | CL2 | CL2 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | 0.00 | 0.50 | 0.0002777777777777778 |
| CL3 | 25.00 | 0.00 | 0.40 | 6.31e+13 | CL3 | CL3 | ['CL3'] | ['CL3'] | 0.12 | 0.68 | 0.0002777777777777778 |
Cross matching¶
If you want to make sure the same pair was found in both directions:
c1.cross_match()
c2.cross_match()
This will fill the mt_cross column:
display(c1)
display(c2)
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None
| mt_input | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | ra | dec | z | mass | mt_self | mt_other | mt_multi_self | mt_multi_other | mt_cross | zmin | zmax | ang |
| str3 | float64 | float64 | float64 | float64 | object | object | object | object | object | float64 | float64 | float64 |
| CL0 | 0.00 | 0.00 | 0.20 | 3.16e+13 | CL0 | CL0 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | CL0 | -0.04 | 0.44 | 0.08418388522320427 |
| CL1 | 0.00 | 0.00 | 0.30 | 2.51e+13 | CL1 | CL1 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | CL1 | 0.04 | 0.56 | 0.062361611333396835 |
| CL2 | 0.00 | 0.00 | 0.25 | 2.00e+13 | CL2 | CL2 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | CL2 | 0.00 | 0.50 | 0.0710414327593546 |
| CL3 | 25.00 | 0.00 | 0.40 | 6.31e+13 | CL3 | CL3 | ['CL3'] | ['CL3'] | CL3 | 0.12 | 0.68 | 0.05169945411341919 |
| CL4 | 20.00 | 0.00 | 0.35 | 1.00e+14 | None | None | [] | [] | None | 0.08 | 0.62 | 0.05623291641697765 |
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None
| mt_input | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | ra | dec | z | mass | mt_self | mt_other | mt_multi_self | mt_multi_other | mt_cross | zmin | zmax | ang |
| str3 | float64 | float64 | float64 | float64 | object | object | object | object | object | float64 | float64 | float64 |
| CL0 | 0.00 | 0.00 | 0.30 | 2.00e+13 | CL0 | CL0 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | CL0 | 0.04 | 0.56 | 0.0002777777777777778 |
| CL1 | 0.00 | 0.00 | 0.20 | 2.51e+13 | CL1 | CL1 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | CL1 | -0.04 | 0.44 | 0.0002777777777777778 |
| CL2 | 0.00 | 0.00 | 0.25 | 3.16e+13 | CL2 | CL2 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | CL2 | 0.00 | 0.50 | 0.0002777777777777778 |
| CL3 | 25.00 | 0.00 | 0.40 | 6.31e+13 | CL3 | CL3 | ['CL3'] | ['CL3'] | CL3 | 0.12 | 0.68 | 0.0002777777777777778 |
Save and Load¶
The results of the matching can easily be saved and load using
ClEvaR tools:
mt.save_matches(c1, c2, out_dir='temp', overwrite=True)
mt.load_matches(c1, c2, out_dir='temp')
display(c1)
display(c2)
Cat1
<< ClEvar used in matching: 0.13.2 >>
* Total objects: 5
* multiple (self): 4
* multiple (other): 4
* unique (self): 4
* unique (other): 4
* cross: 4
Cat2
<< ClEvar used in matching: 0.13.2 >>
* Total objects: 4
* multiple (self): 4
* multiple (other): 4
* unique (self): 4
* unique (other): 4
* cross: 4
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None
| mt_input | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | ra | dec | z | mass | mt_self | mt_other | mt_multi_self | mt_multi_other | mt_cross | zmin | zmax | ang |
| str3 | float64 | float64 | float64 | float64 | object | object | object | object | object | float64 | float64 | float64 |
| CL0 | 0.00 | 0.00 | 0.20 | 3.16e+13 | CL0 | CL0 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | CL0 | -0.04 | 0.44 | 0.08418388522320427 |
| CL1 | 0.00 | 0.00 | 0.30 | 2.51e+13 | CL1 | CL1 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | CL1 | 0.04 | 0.56 | 0.062361611333396835 |
| CL2 | 0.00 | 0.00 | 0.25 | 2.00e+13 | CL2 | CL2 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | CL2 | 0.00 | 0.50 | 0.0710414327593546 |
| CL3 | 25.00 | 0.00 | 0.40 | 6.31e+13 | CL3 | CL3 | ['CL3'] | ['CL3'] | CL3 | 0.12 | 0.68 | 0.05169945411341919 |
| CL4 | 20.00 | 0.00 | 0.35 | 1.00e+14 | None | None | [] | [] | None | 0.08 | 0.62 | 0.05623291641697765 |
tags: id(id), ra(ra), dec(dec), z(z), mass(mass)
Radius unit: None
| mt_input | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | ra | dec | z | mass | mt_self | mt_other | mt_multi_self | mt_multi_other | mt_cross | zmin | zmax | ang |
| str3 | float64 | float64 | float64 | float64 | object | object | object | object | object | float64 | float64 | float64 |
| CL0 | 0.00 | 0.00 | 0.30 | 2.00e+13 | CL0 | CL0 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | CL0 | 0.04 | 0.56 | 0.0002777777777777778 |
| CL1 | 0.00 | 0.00 | 0.20 | 2.51e+13 | CL1 | CL1 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | CL1 | -0.04 | 0.44 | 0.0002777777777777778 |
| CL2 | 0.00 | 0.00 | 0.25 | 3.16e+13 | CL2 | CL2 | ['CL2', 'CL1', 'CL0'] | ['CL2', 'CL1', 'CL0'] | CL2 | 0.00 | 0.50 | 0.0002777777777777778 |
| CL3 | 25.00 | 0.00 | 0.40 | 6.31e+13 | CL3 | CL3 | ['CL3'] | ['CL3'] | CL3 | 0.12 | 0.68 | 0.0002777777777777778 |
Getting Matched Pairs¶
There is functionality inbuilt in clevar to plot some results of the
matching, such as: - Recovery rates - Distances (anguar and redshift) of
cluster centers - Scaling relations (mass, redshift, …) for those cases,
check the match_metrics.ipynb and match_metrics_advanced.ipynb
notebooks.
If those do not provide your needs, you can get directly the matched pairs of clusters:
from clevar.match import get_matched_pairs
mt1, mt2 = get_matched_pairs(c1, c2, 'cross')
These will be catalogs with the corresponding matched pairs:
import pylab as plt
plt.scatter(mt1['mass'], mt2['mass'])
<matplotlib.collections.PathCollection at 0x7fafbd269700>
Outputing matched catalogs¶
To save the current catalogs, you can use the write inbuilt
function:
c1.write('c1_temp.fits', overwrite=True)
This will allow you to save the catalog with its current labels and matching information.
Outputing matching information to original catalogs¶
Assuming your input data came from initial files, clevar also
provides functions create output files that combine all the information
on them with the matching results.
To add the matching information to an input catalog, use:
from clevar.match import output_catalog_with_matching
output_catalog_with_matching('input_catalog.fits', 'output_catalog.fits', c1)
note:
input_catalog.fitsmust have the same number of rows thatc1.
To create a matched catalog containig all columns of both input catalogs, use:
from clevar.match import output_matched_catalog
output_matched_catalog('input_catalog1.fits', 'input_catalog2.fits',
'output_catalog.fits', c1, c2, matching_type='cross')
where matching_type must be cross, cat1 or cat2.
note:
input_catalog1.fitsmust have the same number of rows thatc1(and the same forc2).