Get Started

Importing HDF5 files

1. Importing HDF5 files

According to the 2013 O'Reilly book Python and HDF5 by Andrew Collette,

2. HDF5 files

"In the Python world, consensus is rapidly converging on Hierarchical Data Format version 5, or 'HDF5,' as the standard mechanism for storing large quantities of numerical data." How large are we talking here? According to Collette, "It’s now relatively common to deal with datasets hundreds of gigabytes or even terabytes in size; HDF5 itself can scale up to exabytes." Let's explore with a concrete example from LIGO, the Laser Interferometer Gravitational-Wave Observatory project.

3. Importing HDF5 files

You first import the package h5py and then import the file using 'h5py dot File', remembering to use 'r' in order to specify read only. Printing the datatype to the shell reveals that we are dealing with an h5py file.

4. The structure of HDF5 files

But what is the structure of this file? You can explore it's hierarchical structure as you would that of a Python dictionary using the method keys. You see that there are three keys, meta, quality and strain. Each of these is an HDF group. You can think of these groups as directories. The LIGO documentation tells us that 'meta' contains meta-data for the file, 'quality' contains information about data quality and 'strain' contains 'strain data from the interferometer', the main measurement performed by LIGO, the data of interest. If you knew what data and metadata should be in each group, you could access it straightforwardly. However, if not, due to the hierarchical nature of the file structure, it is easy to explore. For example,

5. The structure of HDF5 files

let's say you wanted to find out what type of metadata there is, you could easily print out the keys. Now you know the keys, you can access any metadata of interest. If you're interested in 'Description' and 'Detector', you can pass these keys to the numpy-dot-array function to convert the values to a NumPy array. You see that the data in the file is 'Strain data time series from LIGO' and that the detector used was 'H1'. Next perhaps you would like to check out the actual data? Great idea and that's precisely what you're going to do in the upcoming exercises!

6. The HDF Project

Before you do so, it is also worth noting that the HDF project is actively maintained by the HDF group, based in Champaign, Illinois and formerly part of the University of Illinois Urbana-Champaign.

7. Let's practice!

Now it's time for you to import and visualize some of the data that led to the validation of Einstein's Theory of Gravitational Waves, enjoy!