Matching catalogs based on membership (simple)¶
Matching two catalogs based on membseship using a configuration dictionary
%load_ext autoreload
%autoreload 2
ClCatalogs¶
Given some input data
import numpy as np
from astropy.table import Table
input1 = Table({'ID': ['CL0a', 'CL1a', 'CL2a', 'CL3a', 'CL4a']})
input1['MASS'] = 1e14*np.arange(1, 6)*10
input2 = Table({'ID': ['CL0b', 'CL1b', 'CL2b', 'CL3b']})
input2['MASS'] = 1e14*np.arange(1, 5)*10
display(input1)
display(input2)
input1_mem = Table(
{'ID':[
'MEM0', 'MEM1', 'MEM2', 'MEM3', 'MEM4',
'MEM5', 'MEM6', 'MEM7', 'MEM8', 'MEM9',
'MEM10', 'MEM11', 'MEM12', 'MEM13', 'MEM14'],
'ID_CLUSTER': [
'CL0a', 'CL0a', 'CL0a', 'CL0a', 'CL0a',
'CL1a', 'CL1a', 'CL1a', 'CL1a', 'CL2a',
'CL2a', 'CL2a', 'CL3a', 'CL3a', 'CL4a'],
})
input2_mem = Table(
{'ID':[
'MEM0', 'MEM1', 'MEM2', 'MEM3', 'MEM4',
'MEM5', 'MEM6', 'MEM7', 'MEM8', 'MEM9',
'MEM10', 'MEM11', 'MEM12', 'MEM13'],
'ID_CLUSTER': [
'CL3b', 'CL0b', 'CL0b', 'CL0b', 'CL0b',
'CL1b', 'CL1b', 'CL1b', 'CL1b', 'CL2b',
'CL2b', 'CL2b', 'CL3b', 'CL3b'],
})
input1_mem['RA'] = np.arange(len(input1_mem))*10.0
input2_mem['RA'] = np.arange(len(input2_mem))*10.0
input1_mem['DEC'] = 0.0
input2_mem['DEC'] = 0.0
input1_mem['Z'] = 0.1
input2_mem['Z'] = 0.1
input1_mem['PMEM'] = 1.0
input2_mem['PMEM'] = 1.0
display(input1_mem)
display(input2_mem)
ID | MASS |
---|---|
str4 | float64 |
CL0a | 1000000000000000.0 |
CL1a | 2000000000000000.0 |
CL2a | 3000000000000000.0 |
CL3a | 4000000000000000.0 |
CL4a | 5000000000000000.0 |
ID | MASS |
---|---|
str4 | float64 |
CL0b | 1000000000000000.0 |
CL1b | 2000000000000000.0 |
CL2b | 3000000000000000.0 |
CL3b | 4000000000000000.0 |
ID | ID_CLUSTER | RA | DEC | Z | PMEM |
---|---|---|---|---|---|
str5 | str4 | float64 | float64 | float64 | float64 |
MEM0 | CL0a | 0.0 | 0.0 | 0.1 | 1.0 |
MEM1 | CL0a | 10.0 | 0.0 | 0.1 | 1.0 |
MEM2 | CL0a | 20.0 | 0.0 | 0.1 | 1.0 |
MEM3 | CL0a | 30.0 | 0.0 | 0.1 | 1.0 |
MEM4 | CL0a | 40.0 | 0.0 | 0.1 | 1.0 |
MEM5 | CL1a | 50.0 | 0.0 | 0.1 | 1.0 |
MEM6 | CL1a | 60.0 | 0.0 | 0.1 | 1.0 |
MEM7 | CL1a | 70.0 | 0.0 | 0.1 | 1.0 |
MEM8 | CL1a | 80.0 | 0.0 | 0.1 | 1.0 |
MEM9 | CL2a | 90.0 | 0.0 | 0.1 | 1.0 |
MEM10 | CL2a | 100.0 | 0.0 | 0.1 | 1.0 |
MEM11 | CL2a | 110.0 | 0.0 | 0.1 | 1.0 |
MEM12 | CL3a | 120.0 | 0.0 | 0.1 | 1.0 |
MEM13 | CL3a | 130.0 | 0.0 | 0.1 | 1.0 |
MEM14 | CL4a | 140.0 | 0.0 | 0.1 | 1.0 |
ID | ID_CLUSTER | RA | DEC | Z | PMEM |
---|---|---|---|---|---|
str5 | str4 | float64 | float64 | float64 | float64 |
MEM0 | CL3b | 0.0 | 0.0 | 0.1 | 1.0 |
MEM1 | CL0b | 10.0 | 0.0 | 0.1 | 1.0 |
MEM2 | CL0b | 20.0 | 0.0 | 0.1 | 1.0 |
MEM3 | CL0b | 30.0 | 0.0 | 0.1 | 1.0 |
MEM4 | CL0b | 40.0 | 0.0 | 0.1 | 1.0 |
MEM5 | CL1b | 50.0 | 0.0 | 0.1 | 1.0 |
MEM6 | CL1b | 60.0 | 0.0 | 0.1 | 1.0 |
MEM7 | CL1b | 70.0 | 0.0 | 0.1 | 1.0 |
MEM8 | CL1b | 80.0 | 0.0 | 0.1 | 1.0 |
MEM9 | CL2b | 90.0 | 0.0 | 0.1 | 1.0 |
MEM10 | CL2b | 100.0 | 0.0 | 0.1 | 1.0 |
MEM11 | CL2b | 110.0 | 0.0 | 0.1 | 1.0 |
MEM12 | CL3b | 120.0 | 0.0 | 0.1 | 1.0 |
MEM13 | CL3b | 130.0 | 0.0 | 0.1 | 1.0 |
Create two ClCatalog
objects, they have the same properties of
astropy
tables with additional functionality. For the membership
matching, the main columns to be included are: - id
- must
correspond to id_cluster
in the cluster member catalog. - mass
(or mass proxy) - necessary for proxity matching if
shared_member_fraction
used as preference criteria for unique
matches, default use.
All of the columns can be added when creating the ClCatalog
object
passing them as keys:
cat = ClCatalog('Cat', ra=[0, 1])
or can also be added afterwards:
cat = ClCatalog('Cat')
cat['ra'] = [0, 1]
from clevar.catalog import ClCatalog
c1 = ClCatalog('Cat1', id=input1['ID'], mass=input1['MASS'])
c2 = ClCatalog('Cat2', id=input2['ID'], mass=input2['MASS'])
# Format for nice display
c1['mass'].info.format = '.2e'
c2['mass'].info.format = '.2e'
display(c1)
display(c2)
tags: id(id), mass(mass)
Radius unit: None
id | mass | mt_self | mt_other | mt_multi_self | mt_multi_other |
---|---|---|---|---|---|
str4 | float64 | object | object | object | object |
CL0a | 1.00e+15 | None | None | [] | [] |
CL1a | 2.00e+15 | None | None | [] | [] |
CL2a | 3.00e+15 | None | None | [] | [] |
CL3a | 4.00e+15 | None | None | [] | [] |
CL4a | 5.00e+15 | None | None | [] | [] |
tags: id(id), mass(mass)
Radius unit: None
id | mass | mt_self | mt_other | mt_multi_self | mt_multi_other |
---|---|---|---|---|---|
str4 | float64 | object | object | object | object |
CL0b | 1.00e+15 | None | None | [] | [] |
CL1b | 2.00e+15 | None | None | [] | [] |
CL2b | 3.00e+15 | None | None | [] | [] |
CL3b | 4.00e+15 | None | None | [] | [] |
The members can be added to the cluster object using the add_members
function. It has a similar instanciating format of a ClCatalog
object, where the columns are added by keyword arguments (the key
id_cluster
is always necessary and must correspond to id
in the
main cluster catalog).
c1.add_members(id=input1_mem['ID'], id_cluster=input1_mem['ID_CLUSTER'],
ra=input1_mem['RA'], dec=input1_mem['DEC'], pmem=input1_mem['PMEM'])
c2.add_members(id=input2_mem['ID'], id_cluster=input2_mem['ID_CLUSTER'],
ra=input2_mem['RA'], dec=input2_mem['DEC'], pmem=input2_mem['PMEM'])
display(c1.members)
display(c2.members)
tags: id(id), id_cluster(id_cluster), ra(ra), dec(dec), pmem(pmem)
id | id_cluster | ra | dec | pmem | ind_cl |
---|---|---|---|---|---|
str5 | str4 | float64 | float64 | float64 | int64 |
MEM0 | CL0a | 0.0 | 0.0 | 1.0 | 0 |
MEM1 | CL0a | 10.0 | 0.0 | 1.0 | 0 |
MEM2 | CL0a | 20.0 | 0.0 | 1.0 | 0 |
MEM3 | CL0a | 30.0 | 0.0 | 1.0 | 0 |
MEM4 | CL0a | 40.0 | 0.0 | 1.0 | 0 |
MEM5 | CL1a | 50.0 | 0.0 | 1.0 | 1 |
MEM6 | CL1a | 60.0 | 0.0 | 1.0 | 1 |
MEM7 | CL1a | 70.0 | 0.0 | 1.0 | 1 |
MEM8 | CL1a | 80.0 | 0.0 | 1.0 | 1 |
MEM9 | CL2a | 90.0 | 0.0 | 1.0 | 2 |
MEM10 | CL2a | 100.0 | 0.0 | 1.0 | 2 |
MEM11 | CL2a | 110.0 | 0.0 | 1.0 | 2 |
MEM12 | CL3a | 120.0 | 0.0 | 1.0 | 3 |
MEM13 | CL3a | 130.0 | 0.0 | 1.0 | 3 |
MEM14 | CL4a | 140.0 | 0.0 | 1.0 | 4 |
tags: id(id), id_cluster(id_cluster), ra(ra), dec(dec), pmem(pmem)
id | id_cluster | ra | dec | pmem | ind_cl |
---|---|---|---|---|---|
str5 | str4 | float64 | float64 | float64 | int64 |
MEM0 | CL3b | 0.0 | 0.0 | 1.0 | 3 |
MEM1 | CL0b | 10.0 | 0.0 | 1.0 | 0 |
MEM2 | CL0b | 20.0 | 0.0 | 1.0 | 0 |
MEM3 | CL0b | 30.0 | 0.0 | 1.0 | 0 |
MEM4 | CL0b | 40.0 | 0.0 | 1.0 | 0 |
MEM5 | CL1b | 50.0 | 0.0 | 1.0 | 1 |
MEM6 | CL1b | 60.0 | 0.0 | 1.0 | 1 |
MEM7 | CL1b | 70.0 | 0.0 | 1.0 | 1 |
MEM8 | CL1b | 80.0 | 0.0 | 1.0 | 1 |
MEM9 | CL2b | 90.0 | 0.0 | 1.0 | 2 |
MEM10 | CL2b | 100.0 | 0.0 | 1.0 | 2 |
MEM11 | CL2b | 110.0 | 0.0 | 1.0 | 2 |
MEM12 | CL3b | 120.0 | 0.0 | 1.0 | 3 |
MEM13 | CL3b | 130.0 | 0.0 | 1.0 | 3 |
The catalogs can also be read directly from files, for more details see catalogs.ipynb.
Matching¶
Import the MembershipMatch
and create a object for matching
from clevar.match import MembershipMatch
mt = MembershipMatch()
Prepare the configuration. The main values are:
type
: Type of matching to be considered. Can be a simple match of ClCatalog1->ClCatalog2 (cat1
), ClCatalog2->ClCatalog1 (cat2
) or cross matching.preference
: In cases where there are multiple matched, how the best candidate will be chosen.minimum_share_fraction1
: Minimum share fraction of catalog 1 to consider in matches (default=0
).minimum_share_fraction2
: Minimum share fraction of catalog 2 to consider in matches (default=0
).match_members
: Match the members catalogs (default=True
), necessary if not already made.match_members_kwargs
: dictionary of arguments to match members, needed ifmatch_members=True
. Keys are:method
(str): Method for matching. Options areid
orangular_distance
.radius
(str, None): Formethod='angular_distance'
. Radius for matching, with format'value unit'
(ex:1 arcsec
,1 Mpc
).cosmo
(clevar.Cosmology, None): Formethod='angular_distance'
. Cosmology object for when radius has physical units.
match_members_save
: saves file with matched members (default=False
).match_members_load
: load matched members (default=False
), ifTrue
skips matching (and save) of members.match_members_file
: file to save matching of members, needed ifmatch_members_save
ormatch_members_load
isTrue
.shared_members_fill
: Adds shared members dicts and nmem to mt_input in catalogs (default=True
), necessary if not already made.shared_members_save
: saves files with shared members (default=False
).shared_members_load
: load files with shared members (default=False
), ifTrue
skips matching (and save) of members and fill (and save) of shared members.shared_members_file
: Prefix of file names to save shared members, needed ifshared_members_save
orshared_members_load
isTrue
.verbose
: Print result for individual matches (default=True
).
match_config = {
'type': 'cross', # options are cross, cat1, cat2
'preference': 'shared_member_fraction', # other options are more_massive, angular_proximity or redshift_proximity
'minimum_share_fraction': 0,
'match_members_kwargs': {'method':'id'},
}
Once the configuration is prepared, the whole process can be done with one call:
%%time
mt.match_from_config(c1, c2, match_config)
28 members were matched.
## Multiple match (catalog 1)
Finding candidates (Cat1)
* 4/5 objects matched.
## Multiple match (catalog 2)
Finding candidates (Cat2)
* 4/4 objects matched.
## Finding unique matches of catalog 1
Unique Matches (Cat1)
* 4/5 objects matched.
## Finding unique matches of catalog 2
Unique Matches (Cat2)
* 4/4 objects matched.
Cross Matches (Cat1)
* 4/5 objects matched.
Cross Matches (Cat2)
* 4/4 objects matched.
CPU times: user 8.34 ms, sys: 438 µs, total: 8.78 ms
Wall time: 8.12 ms
This will fill the matching columns in the catalogs: -
mt_multi_self
: Multiple matches found - mt_multi_other
: Multiple
matches found by the other catalog - mt_self
: Best candidate found -
mt_other
: Best candidate found by the other catalog -
mt_frac_self
: Fraction of shared members with the best candidate
found - mt_frac_other
: Fraction of shared members by the best
candidate found by the other catalog, relative to the other catalog -
mt_cross
: Best candidate found in both directions
If pmem
is present in the members catalogs, the shared fractions are
computed by:
\(\frac{\sum_{shared\;members}Pmem_i}{\sum_{cluster\;members}Pmem_i}\)
\(\frac{\sum_{shared\;members}Pmem_i}{\sum_{cluster\;members}Pmem_i}\)
display(c1)
display(c2)
tags: id(id), mass(mass)
Radius unit: None
mt_input | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
id | mass | mt_self | mt_other | mt_multi_self | mt_multi_other | mt_frac_self | mt_frac_other | mt_cross | share_mems | nmem |
str4 | float64 | object | object | object | object | float64 | float64 | object | object | float64 |
CL0a | 1.00e+15 | CL0b | CL0b | ['CL3b', 'CL0b'] | ['CL3b', 'CL0b'] | 0.8 | 1.0 | CL0b | {'CL3b': 1.0, 'CL0b': 4.0} | 5.0 |
CL1a | 2.00e+15 | CL1b | CL1b | ['CL1b'] | ['CL1b'] | 1.0 | 1.0 | CL1b | {'CL1b': 4.0} | 4.0 |
CL2a | 3.00e+15 | CL2b | CL2b | ['CL2b'] | ['CL2b'] | 1.0 | 1.0 | CL2b | {'CL2b': 3.0} | 3.0 |
CL3a | 4.00e+15 | CL3b | CL3b | ['CL3b'] | ['CL3b'] | 1.0 | 0.6666666666666666 | CL3b | {'CL3b': 2.0} | 2.0 |
CL4a | 5.00e+15 | None | None | [] | [] | 0.0 | 0.0 | None | {} | 1.0 |
tags: id(id), mass(mass)
Radius unit: None
mt_input | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
id | mass | mt_self | mt_other | mt_multi_self | mt_multi_other | mt_frac_other | mt_frac_self | mt_cross | share_mems | nmem |
str4 | float64 | object | object | object | object | float64 | float64 | object | object | float64 |
CL0b | 1.00e+15 | CL0a | CL0a | ['CL0a'] | ['CL0a'] | 0.8 | 1.0 | CL0a | {'CL0a': 4.0} | 4.0 |
CL1b | 2.00e+15 | CL1a | CL1a | ['CL1a'] | ['CL1a'] | 1.0 | 1.0 | CL1a | {'CL1a': 4.0} | 4.0 |
CL2b | 3.00e+15 | CL2a | CL2a | ['CL2a'] | ['CL2a'] | 1.0 | 1.0 | CL2a | {'CL2a': 3.0} | 3.0 |
CL3b | 4.00e+15 | CL3a | CL3a | ['CL3a', 'CL0a'] | ['CL3a', 'CL0a'] | 1.0 | 0.6666666666666666 | CL3a | {'CL3a': 2.0, 'CL0a': 1.0} | 3.0 |
The steps of matching are stored in the catalogs and can be checked:
c1.show_mt_hist(50)
multiple(cat1='Cat1', cat2='Cat2')
multiple(cat1='Cat2', cat2='Cat1')
unique(cat1='Cat1', cat2='Cat2',
preference='shared_member_fraction',
minimum_share_fraction=0)
unique(cat1='Cat2', cat2='Cat1',
preference='shared_member_fraction',
minimum_share_fraction=0)
Save and Load¶
The results of the matching can easily be saved and load using
ClEvaR
tools:
mt.save_matches(c1, c2, out_dir='temp', overwrite=True)
mt.load_matches(c1, c2, out_dir='temp')
display(c1)
display(c2)
Cat1
<< ClEvar used in matching: 0.13.2 >>
* Total objects: 5
* multiple (self): 4
* multiple (other): 4
* unique (self): 4
* unique (other): 4
* cross: 4
Cat2
<< ClEvar used in matching: 0.13.2 >>
* Total objects: 4
* multiple (self): 4
* multiple (other): 4
* unique (self): 4
* unique (other): 4
* cross: 4
tags: id(id), mass(mass)
Radius unit: None
mt_input | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
id | mass | mt_self | mt_other | mt_multi_self | mt_multi_other | mt_frac_self | mt_frac_other | mt_cross | share_mems | nmem |
str4 | float64 | object | object | object | object | float64 | float64 | object | object | float64 |
CL0a | 1.00e+15 | CL0b | CL0b | ['CL3b', 'CL0b'] | ['CL3b', 'CL0b'] | 0.8 | 1.0 | CL0b | {'CL3b': 1.0, 'CL0b': 4.0} | 5.0 |
CL1a | 2.00e+15 | CL1b | CL1b | ['CL1b'] | ['CL1b'] | 1.0 | 1.0 | CL1b | {'CL1b': 4.0} | 4.0 |
CL2a | 3.00e+15 | CL2b | CL2b | ['CL2b'] | ['CL2b'] | 1.0 | 1.0 | CL2b | {'CL2b': 3.0} | 3.0 |
CL3a | 4.00e+15 | CL3b | CL3b | ['CL3b'] | ['CL3b'] | 1.0 | 0.6666666666666666 | CL3b | {'CL3b': 2.0} | 2.0 |
CL4a | 5.00e+15 | None | None | [] | [] | 0.0 | 0.0 | None | {} | 1.0 |
tags: id(id), mass(mass)
Radius unit: None
mt_input | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
id | mass | mt_self | mt_other | mt_multi_self | mt_multi_other | mt_frac_other | mt_frac_self | mt_cross | share_mems | nmem |
str4 | float64 | object | object | object | object | float64 | float64 | object | object | float64 |
CL0b | 1.00e+15 | CL0a | CL0a | ['CL0a'] | ['CL0a'] | 0.8 | 1.0 | CL0a | {'CL0a': 4.0} | 4.0 |
CL1b | 2.00e+15 | CL1a | CL1a | ['CL1a'] | ['CL1a'] | 1.0 | 1.0 | CL1a | {'CL1a': 4.0} | 4.0 |
CL2b | 3.00e+15 | CL2a | CL2a | ['CL2a'] | ['CL2a'] | 1.0 | 1.0 | CL2a | {'CL2a': 3.0} | 3.0 |
CL3b | 4.00e+15 | CL3a | CL3a | ['CL3a', 'CL0a'] | ['CL3a', 'CL0a'] | 1.0 | 0.6666666666666666 | CL3a | {'CL3a': 2.0, 'CL0a': 1.0} | 3.0 |
Getting Matched Pairs¶
There is functionality inbuilt in clevar
to plot some results of the
matching, such as: - Recovery rates - Distances (anguar and redshift) of
cluster centers - Scaling relations (mass, redshift, …) for those cases,
check the match_metrics.ipynb and match_metrics_advanced.ipynb
notebooks.
If those do not provide your needs, you can get directly the matched pairs of clusters:
from clevar.match import get_matched_pairs
mt1, mt2 = get_matched_pairs(c1, c2, 'cross')
These will be catalogs with the corresponding matched pairs:
import pylab as plt
plt.scatter(mt1['mass'], mt2['mass'])
<matplotlib.collections.PathCollection at 0x7ff7d886c400>

Members of matched pairs¶
The members also carry the information on the matched clusters. The
column match
shows to which clusters of the other catalog this
member also belongs. The column in_mt_sample
says if those clusters
are presented in the matched sample:
mt1.members
tags: id(id), id_cluster(id_cluster), ra(ra), dec(dec), pmem(pmem)
id | id_cluster | ra | dec | pmem | ind_cl | match | in_mt_sample |
---|---|---|---|---|---|---|---|
str5 | str4 | float64 | float64 | float64 | int64 | object | bool |
MEM0 | CL0a | 0.0 | 0.0 | 1.0 | 0 | ['CL3b'] | True |
MEM1 | CL0a | 10.0 | 0.0 | 1.0 | 0 | ['CL0b'] | True |
MEM2 | CL0a | 20.0 | 0.0 | 1.0 | 0 | ['CL0b'] | True |
MEM3 | CL0a | 30.0 | 0.0 | 1.0 | 0 | ['CL0b'] | True |
MEM4 | CL0a | 40.0 | 0.0 | 1.0 | 0 | ['CL0b'] | True |
MEM5 | CL1a | 50.0 | 0.0 | 1.0 | 1 | ['CL1b'] | True |
MEM6 | CL1a | 60.0 | 0.0 | 1.0 | 1 | ['CL1b'] | True |
MEM7 | CL1a | 70.0 | 0.0 | 1.0 | 1 | ['CL1b'] | True |
MEM8 | CL1a | 80.0 | 0.0 | 1.0 | 1 | ['CL1b'] | True |
MEM9 | CL2a | 90.0 | 0.0 | 1.0 | 2 | ['CL2b'] | True |
MEM10 | CL2a | 100.0 | 0.0 | 1.0 | 2 | ['CL2b'] | True |
MEM11 | CL2a | 110.0 | 0.0 | 1.0 | 2 | ['CL2b'] | True |
MEM12 | CL3a | 120.0 | 0.0 | 1.0 | 3 | ['CL3b'] | True |
MEM13 | CL3a | 130.0 | 0.0 | 1.0 | 3 | ['CL3b'] | True |
Outputing matched catalogs¶
To save the current catalogs, you can use the write
inbuilt
function:
c1.write('c1_temp.fits', overwrite=True)
This will allow you to save the catalog with its current labels and matching information.
Outputing matching information to original catalogs¶
Assuming your input data came from initial files, clevar
also
provides functions create output files that combine all the information
on them with the matching results.
To add the matching information to an input catalog, use:
from clevar.match import output_catalog_with_matching
output_catalog_with_matching('input_catalog.fits', 'output_catalog.fits', c1)
note:
input_catalog.fits
must have the same number of rows thatc1
.
To create a matched catalog containig all columns of both input catalogs, use:
from clevar.match import output_matched_catalog
output_matched_catalog('input_catalog1.fits', 'input_catalog2.fits',
'output_catalog.fits', c1, c2, matching_type='cross')
where matching_type
must be cross
, cat1
or cat2
.
note:
input_catalog1.fits
must have the same number of rows thatc1
(and the same forc2
).