Reading HDF5 Files
Overview
Many TXPipe outputs are in the HDF5 format. This is a fast and flexible file type that can also be easily read/written in parallel.
HDF5 files contain three types of object:
datasets
are equivalent to saved numpy arrays.groups
are like directories and can contain datasets or sub-groups.attributes
are for small pieces of metadata, and a set of attributes can convert to a python dictionary. They can be attached to whole files or to individual groups or datasets.
The name scheme for datasets and groups is the same a for Unix files and folders, e.g. f['group/subgroup/dataset']
.
From the command line, you can use the h5ls
command to list the contents of an HDF5 file:
h5ls -r filename.hdf5
h5py
In python, you read these files with the h5py
library. Here’s an example opening one of the files generated by the example “laptop” pipeline in TXPipe:
import h5py
f = h5py.File("./data/example/outputs/shear_tomography_catalog.hdf5")
# Print out the items in the root of the file
print(f.keys())
# prints <KeysViewHDF5 ['metacal_response', 'provenance', 'tomography']>
# showing the three groups generated by tomography stage
We can create variables to represent groups in the file:
g = f["tomography"]
print(g.keys())
# prints <KeysViewHDF5 ['N_eff', 'N_eff_2d', 'mean_e1', 'mean_e1_2d', 'mean_e2', 'mean_e2_2d', 'sigma_e', 'sigma_e_2d', 'source_bin', 'source_counts', 'source_counts_2d']>
Printing a dataset doesn’t load it, it just shows the size and type of the data:
print(g["mean_e1"])
# prints <HDF5 dataset "mean_e1": shape (4,), type "<f4">
Instead we load data sets as a numpy arrays with a slice:
e = g["mean_e1"][:]
print(e)
# prints [ 0.00283134 -0.0140038 0.0011645 -0.01299088]
For longer arrays we may want to just read a subset of the data:
b = g["source_bins"][0:100]
Attributes
The easiest way to read attributes from h5py is to turn them into a dictionary:
d = dict(f['provenance'].attrs)
print(d)
# prints lots of provenance tracking information like all the package versions
# and configuration options