Matching catalogs based on membership (simple)

Matching two catalogs based on membseship using a configuration dictionary

%load_ext autoreload
%autoreload 2

ClCatalogs

Given some input data

import numpy as np
from astropy.table import Table
input1 = Table({'ID': ['CL0a', 'CL1a', 'CL2a', 'CL3a', 'CL4a']})
input1['MASS'] = 1e14*np.arange(1, 6)*10
input2 = Table({'ID': ['CL0b', 'CL1b', 'CL2b', 'CL3b']})
input2['MASS'] = 1e14*np.arange(1, 5)*10
display(input1)
display(input2)
input1_mem = Table(
    {'ID':[
        'MEM0', 'MEM1', 'MEM2', 'MEM3', 'MEM4',
        'MEM5', 'MEM6', 'MEM7', 'MEM8', 'MEM9',
        'MEM10', 'MEM11', 'MEM12', 'MEM13', 'MEM14'],
     'ID_CLUSTER': [
         'CL0a', 'CL0a', 'CL0a', 'CL0a', 'CL0a',
         'CL1a', 'CL1a', 'CL1a', 'CL1a', 'CL2a',
         'CL2a', 'CL2a', 'CL3a', 'CL3a', 'CL4a'],
    })
input2_mem = Table(
    {'ID':[
        'MEM0', 'MEM1', 'MEM2', 'MEM3', 'MEM4',
        'MEM5', 'MEM6', 'MEM7', 'MEM8', 'MEM9',
        'MEM10', 'MEM11', 'MEM12', 'MEM13'],
     'ID_CLUSTER': [
         'CL3b', 'CL0b', 'CL0b', 'CL0b', 'CL0b',
         'CL1b', 'CL1b', 'CL1b', 'CL1b', 'CL2b',
         'CL2b', 'CL2b', 'CL3b', 'CL3b'],
    })
input1_mem['RA'] = np.arange(len(input1_mem))*10.0
input2_mem['RA'] = np.arange(len(input2_mem))*10.0
input1_mem['DEC'] = 0.0
input2_mem['DEC'] = 0.0
input1_mem['Z'] = 0.1
input2_mem['Z'] = 0.1
input1_mem['PMEM'] = 1.0
input2_mem['PMEM'] = 1.0
display(input1_mem)
display(input2_mem)
Table length=5
IDMASS
str4float64
CL0a1000000000000000.0
CL1a2000000000000000.0
CL2a3000000000000000.0
CL3a4000000000000000.0
CL4a5000000000000000.0
Table length=4
IDMASS
str4float64
CL0b1000000000000000.0
CL1b2000000000000000.0
CL2b3000000000000000.0
CL3b4000000000000000.0
Table length=15
IDID_CLUSTERRADECZPMEM
str5str4float64float64float64float64
MEM0CL0a0.00.00.11.0
MEM1CL0a10.00.00.11.0
MEM2CL0a20.00.00.11.0
MEM3CL0a30.00.00.11.0
MEM4CL0a40.00.00.11.0
MEM5CL1a50.00.00.11.0
MEM6CL1a60.00.00.11.0
MEM7CL1a70.00.00.11.0
MEM8CL1a80.00.00.11.0
MEM9CL2a90.00.00.11.0
MEM10CL2a100.00.00.11.0
MEM11CL2a110.00.00.11.0
MEM12CL3a120.00.00.11.0
MEM13CL3a130.00.00.11.0
MEM14CL4a140.00.00.11.0
Table length=14
IDID_CLUSTERRADECZPMEM
str5str4float64float64float64float64
MEM0CL3b0.00.00.11.0
MEM1CL0b10.00.00.11.0
MEM2CL0b20.00.00.11.0
MEM3CL0b30.00.00.11.0
MEM4CL0b40.00.00.11.0
MEM5CL1b50.00.00.11.0
MEM6CL1b60.00.00.11.0
MEM7CL1b70.00.00.11.0
MEM8CL1b80.00.00.11.0
MEM9CL2b90.00.00.11.0
MEM10CL2b100.00.00.11.0
MEM11CL2b110.00.00.11.0
MEM12CL3b120.00.00.11.0
MEM13CL3b130.00.00.11.0

Create two ClCatalog objects, they have the same properties of astropy tables with additional functionality. For the membership matching, the main columns to be included are: - id - must correspond to id_cluster in the cluster member catalog. - mass (or mass proxy) - necessary for proxity matching if shared_member_fraction used as preference criteria for unique matches, default use.

All of the columns can be added when creating the ClCatalog object passing them as keys:

cat = ClCatalog('Cat', ra=[0, 1])

or can also be added afterwards:

cat = ClCatalog('Cat')
cat['ra'] = [0, 1]
from clevar.catalog import ClCatalog
c1 = ClCatalog('Cat1', id=input1['ID'], mass=input1['MASS'])
c2 = ClCatalog('Cat2', id=input2['ID'], mass=input2['MASS'])

# Format for nice display
c1['mass'].info.format = '.2e'
c2['mass'].info.format = '.2e'

display(c1)
display(c2)
Cat1
tags: id(id), mass(mass)
Radius unit: None
ClData length=5
idmassmt_selfmt_othermt_multi_selfmt_multi_other
str4float64objectobjectobjectobject
CL0a1.00e+15NoneNone[][]
CL1a2.00e+15NoneNone[][]
CL2a3.00e+15NoneNone[][]
CL3a4.00e+15NoneNone[][]
CL4a5.00e+15NoneNone[][]
Cat2
tags: id(id), mass(mass)
Radius unit: None
ClData length=4
idmassmt_selfmt_othermt_multi_selfmt_multi_other
str4float64objectobjectobjectobject
CL0b1.00e+15NoneNone[][]
CL1b2.00e+15NoneNone[][]
CL2b3.00e+15NoneNone[][]
CL3b4.00e+15NoneNone[][]

The members can be added to the cluster object using the add_members function. It has a similar instanciating format of a ClCatalog object, where the columns are added by keyword arguments (the key id_cluster is always necessary and must correspond to id in the main cluster catalog).

c1.add_members(id=input1_mem['ID'], id_cluster=input1_mem['ID_CLUSTER'],
               ra=input1_mem['RA'], dec=input1_mem['DEC'], pmem=input1_mem['PMEM'])
c2.add_members(id=input2_mem['ID'], id_cluster=input2_mem['ID_CLUSTER'],
               ra=input2_mem['RA'], dec=input2_mem['DEC'], pmem=input2_mem['PMEM'])

display(c1.members)
display(c2.members)
members
tags: id(id), id_cluster(id_cluster), ra(ra), dec(dec), pmem(pmem)
ClData length=15
idid_clusterradecpmemind_cl
str5str4float64float64float64int64
MEM0CL0a0.00.01.00
MEM1CL0a10.00.01.00
MEM2CL0a20.00.01.00
MEM3CL0a30.00.01.00
MEM4CL0a40.00.01.00
MEM5CL1a50.00.01.01
MEM6CL1a60.00.01.01
MEM7CL1a70.00.01.01
MEM8CL1a80.00.01.01
MEM9CL2a90.00.01.02
MEM10CL2a100.00.01.02
MEM11CL2a110.00.01.02
MEM12CL3a120.00.01.03
MEM13CL3a130.00.01.03
MEM14CL4a140.00.01.04
members
tags: id(id), id_cluster(id_cluster), ra(ra), dec(dec), pmem(pmem)
ClData length=14
idid_clusterradecpmemind_cl
str5str4float64float64float64int64
MEM0CL3b0.00.01.03
MEM1CL0b10.00.01.00
MEM2CL0b20.00.01.00
MEM3CL0b30.00.01.00
MEM4CL0b40.00.01.00
MEM5CL1b50.00.01.01
MEM6CL1b60.00.01.01
MEM7CL1b70.00.01.01
MEM8CL1b80.00.01.01
MEM9CL2b90.00.01.02
MEM10CL2b100.00.01.02
MEM11CL2b110.00.01.02
MEM12CL3b120.00.01.03
MEM13CL3b130.00.01.03

The catalogs can also be read directly from files, for more details see catalogs.ipynb.

Matching

Import the MembershipMatch and create a object for matching

from clevar.match import MembershipMatch
mt = MembershipMatch()

Prepare the configuration. The main values are:

  • type: Type of matching to be considered. Can be a simple match of ClCatalog1->ClCatalog2 (cat1), ClCatalog2->ClCatalog1 (cat2) or cross matching.

  • preference: In cases where there are multiple matched, how the best candidate will be chosen.

  • minimum_share_fraction1: Minimum share fraction of catalog 1 to consider in matches (default=0).

  • minimum_share_fraction2: Minimum share fraction of catalog 2 to consider in matches (default=0).

  • match_members: Match the members catalogs (default=True), necessary if not already made.

  • match_members_kwargs: dictionary of arguments to match members, needed if match_members=True. Keys are:

    • method(str): Method for matching. Options are id or angular_distance.

    • radius(str, None): For method='angular_distance'. Radius for matching, with format 'value unit' (ex: 1 arcsec, 1 Mpc).

    • cosmo(clevar.Cosmology, None): For method='angular_distance'. Cosmology object for when radius has physical units.

  • match_members_save: saves file with matched members (default=False).

  • match_members_load: load matched members (default=False), if True skips matching (and save) of members.

  • match_members_file: file to save matching of members, needed if match_members_save or match_members_load is True.

  • shared_members_fill: Adds shared members dicts and nmem to mt_input in catalogs (default=True), necessary if not already made.

  • shared_members_save: saves files with shared members (default=False).

  • shared_members_load: load files with shared members (default=False), if True skips matching (and save) of members and fill (and save) of shared members.

  • shared_members_file: Prefix of file names to save shared members, needed if shared_members_save or shared_members_load is True.

  • verbose: Print result for individual matches (default=True).

match_config = {
    'type': 'cross', # options are cross, cat1, cat2
    'preference': 'shared_member_fraction', # other options are more_massive, angular_proximity or redshift_proximity
    'minimum_share_fraction': 0,
    'match_members_kwargs': {'method':'id'},
}

Once the configuration is prepared, the whole process can be done with one call:

%%time
mt.match_from_config(c1, c2, match_config)
28 members were matched.

## Multiple match (catalog 1)
Finding candidates (Cat1)
* 4/5 objects matched.

## Multiple match (catalog 2)
Finding candidates (Cat2)
* 4/4 objects matched.

## Finding unique matches of catalog 1
Unique Matches (Cat1)
* 4/5 objects matched.

## Finding unique matches of catalog 2
Unique Matches (Cat2)
* 4/4 objects matched.
Cross Matches (Cat1)
* 4/5 objects matched.
Cross Matches (Cat2)
* 4/4 objects matched.
CPU times: user 8.34 ms, sys: 438 µs, total: 8.78 ms
Wall time: 8.12 ms

This will fill the matching columns in the catalogs: - mt_multi_self: Multiple matches found - mt_multi_other: Multiple matches found by the other catalog - mt_self: Best candidate found - mt_other: Best candidate found by the other catalog - mt_frac_self: Fraction of shared members with the best candidate found - mt_frac_other: Fraction of shared members by the best candidate found by the other catalog, relative to the other catalog - mt_cross: Best candidate found in both directions

If pmem is present in the members catalogs, the shared fractions are computed by:

\(\frac{\sum_{shared\;members}Pmem_i}{\sum_{cluster\;members}Pmem_i}\)

display(c1)
display(c2)
Cat1
tags: id(id), mass(mass)
Radius unit: None
ClData length=5
mt_input
idmassmt_selfmt_othermt_multi_selfmt_multi_othermt_frac_selfmt_frac_othermt_crossshare_memsnmem
str4float64objectobjectobjectobjectfloat64float64objectobjectfloat64
CL0a1.00e+15CL0bCL0b['CL3b', 'CL0b']['CL3b', 'CL0b']0.81.0CL0b{'CL3b': 1.0, 'CL0b': 4.0}5.0
CL1a2.00e+15CL1bCL1b['CL1b']['CL1b']1.01.0CL1b{'CL1b': 4.0}4.0
CL2a3.00e+15CL2bCL2b['CL2b']['CL2b']1.01.0CL2b{'CL2b': 3.0}3.0
CL3a4.00e+15CL3bCL3b['CL3b']['CL3b']1.00.6666666666666666CL3b{'CL3b': 2.0}2.0
CL4a5.00e+15NoneNone[][]0.00.0None{}1.0
Cat2
tags: id(id), mass(mass)
Radius unit: None
ClData length=4
mt_input
idmassmt_selfmt_othermt_multi_selfmt_multi_othermt_frac_othermt_frac_selfmt_crossshare_memsnmem
str4float64objectobjectobjectobjectfloat64float64objectobjectfloat64
CL0b1.00e+15CL0aCL0a['CL0a']['CL0a']0.81.0CL0a{'CL0a': 4.0}4.0
CL1b2.00e+15CL1aCL1a['CL1a']['CL1a']1.01.0CL1a{'CL1a': 4.0}4.0
CL2b3.00e+15CL2aCL2a['CL2a']['CL2a']1.01.0CL2a{'CL2a': 3.0}3.0
CL3b4.00e+15CL3aCL3a['CL3a', 'CL0a']['CL3a', 'CL0a']1.00.6666666666666666CL3a{'CL3a': 2.0, 'CL0a': 1.0}3.0

The steps of matching are stored in the catalogs and can be checked:

c1.show_mt_hist(50)
multiple(cat1='Cat1', cat2='Cat2')

multiple(cat1='Cat2', cat2='Cat1')

unique(cat1='Cat1', cat2='Cat2',
       preference='shared_member_fraction',
       minimum_share_fraction=0)

unique(cat1='Cat2', cat2='Cat1',
       preference='shared_member_fraction',
       minimum_share_fraction=0)

Save and Load

The results of the matching can easily be saved and load using ClEvaR tools:

mt.save_matches(c1, c2, out_dir='temp', overwrite=True)
mt.load_matches(c1, c2, out_dir='temp')
display(c1)
display(c2)
Cat1
<< ClEvar used in matching: 0.13.2 >>
 * Total objects:    5
 * multiple (self):  4
 * multiple (other): 4
 * unique (self):    4
 * unique (other):   4
 * cross:            4

Cat2
<< ClEvar used in matching: 0.13.2 >>
 * Total objects:    4
 * multiple (self):  4
 * multiple (other): 4
 * unique (self):    4
 * unique (other):   4
 * cross:            4
Cat1
tags: id(id), mass(mass)
Radius unit: None
ClData length=5
mt_input
idmassmt_selfmt_othermt_multi_selfmt_multi_othermt_frac_selfmt_frac_othermt_crossshare_memsnmem
str4float64objectobjectobjectobjectfloat64float64objectobjectfloat64
CL0a1.00e+15CL0bCL0b['CL3b', 'CL0b']['CL3b', 'CL0b']0.81.0CL0b{'CL3b': 1.0, 'CL0b': 4.0}5.0
CL1a2.00e+15CL1bCL1b['CL1b']['CL1b']1.01.0CL1b{'CL1b': 4.0}4.0
CL2a3.00e+15CL2bCL2b['CL2b']['CL2b']1.01.0CL2b{'CL2b': 3.0}3.0
CL3a4.00e+15CL3bCL3b['CL3b']['CL3b']1.00.6666666666666666CL3b{'CL3b': 2.0}2.0
CL4a5.00e+15NoneNone[][]0.00.0None{}1.0
Cat2
tags: id(id), mass(mass)
Radius unit: None
ClData length=4
mt_input
idmassmt_selfmt_othermt_multi_selfmt_multi_othermt_frac_othermt_frac_selfmt_crossshare_memsnmem
str4float64objectobjectobjectobjectfloat64float64objectobjectfloat64
CL0b1.00e+15CL0aCL0a['CL0a']['CL0a']0.81.0CL0a{'CL0a': 4.0}4.0
CL1b2.00e+15CL1aCL1a['CL1a']['CL1a']1.01.0CL1a{'CL1a': 4.0}4.0
CL2b3.00e+15CL2aCL2a['CL2a']['CL2a']1.01.0CL2a{'CL2a': 3.0}3.0
CL3b4.00e+15CL3aCL3a['CL3a', 'CL0a']['CL3a', 'CL0a']1.00.6666666666666666CL3a{'CL3a': 2.0, 'CL0a': 1.0}3.0

Getting Matched Pairs

There is functionality inbuilt in clevar to plot some results of the matching, such as: - Recovery rates - Distances (anguar and redshift) of cluster centers - Scaling relations (mass, redshift, …) for those cases, check the match_metrics.ipynb and match_metrics_advanced.ipynb notebooks.

If those do not provide your needs, you can get directly the matched pairs of clusters:

from clevar.match import get_matched_pairs
mt1, mt2 = get_matched_pairs(c1, c2, 'cross')

These will be catalogs with the corresponding matched pairs:

import pylab as plt
plt.scatter(mt1['mass'], mt2['mass'])
<matplotlib.collections.PathCollection at 0x7ff7d886c400>
../_images/membership_matching_25_1.png

Members of matched pairs

The members also carry the information on the matched clusters. The column match shows to which clusters of the other catalog this member also belongs. The column in_mt_sample says if those clusters are presented in the matched sample:

mt1.members
members
tags: id(id), id_cluster(id_cluster), ra(ra), dec(dec), pmem(pmem)
ClData length=14
idid_clusterradecpmemind_clmatchin_mt_sample
str5str4float64float64float64int64objectbool
MEM0CL0a0.00.01.00['CL3b']True
MEM1CL0a10.00.01.00['CL0b']True
MEM2CL0a20.00.01.00['CL0b']True
MEM3CL0a30.00.01.00['CL0b']True
MEM4CL0a40.00.01.00['CL0b']True
MEM5CL1a50.00.01.01['CL1b']True
MEM6CL1a60.00.01.01['CL1b']True
MEM7CL1a70.00.01.01['CL1b']True
MEM8CL1a80.00.01.01['CL1b']True
MEM9CL2a90.00.01.02['CL2b']True
MEM10CL2a100.00.01.02['CL2b']True
MEM11CL2a110.00.01.02['CL2b']True
MEM12CL3a120.00.01.03['CL3b']True
MEM13CL3a130.00.01.03['CL3b']True

Outputing matched catalogs

To save the current catalogs, you can use the write inbuilt function:

c1.write('c1_temp.fits', overwrite=True)

This will allow you to save the catalog with its current labels and matching information.

Outputing matching information to original catalogs

Assuming your input data came from initial files, clevar also provides functions create output files that combine all the information on them with the matching results.

To add the matching information to an input catalog, use:

from clevar.match import output_catalog_with_matching
output_catalog_with_matching('input_catalog.fits', 'output_catalog.fits', c1)
  • note: input_catalog.fits must have the same number of rows that c1.

To create a matched catalog containig all columns of both input catalogs, use:

from clevar.match import output_matched_catalog
output_matched_catalog('input_catalog1.fits', 'input_catalog2.fits',
    'output_catalog.fits', c1, c2, matching_type='cross')

where matching_type must be cross, cat1 or cat2.

  • note: input_catalog1.fits must have the same number of rows that c1 (and the same for c2).