Get startedGet started for free

Dask arrays from HDF5 datasets

You have been tasked with analyzing European rainfall over the last 40 years. The monthly average rainfall in a grid of locations over Europe has been provided for you in HDF5 format. Since this file is pretty large, you decide to load and process it using Dask.

h5py has been imported for you, and dask.array has been imported as da.

This exercise is part of the course

Parallel Programming with Dask in Python

View Course

Exercise instructions

  • Open the 'data/era_eu.hdf5' file using h5py.
  • Load the '/precip' variable into a Dask array using the from_array() function, and set chunks of (12 months, 15 latitudes, and 15 longitudes).
  • Use array slicing to select every 12th index along the first axis - this selects the January data from all years.
  • Take the mean of january_rainfalls along the time axis (axis 0) to calculate the mean rainfall in January across Europe.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Open the HDF5 dataset using h5py
hdf5_file = ____.____(____)

# Load the file into a Dask array with a reasonable chunk size
precip = da.____(____, chunks=____)

# Select only the months of January
january_rainfalls = ____[____]

# Calculate the mean rainfall in January for each location
january_mean_rainfall = ____.____(axis=____)

plt.imshow(january_mean_rainfall.compute())
plt.show()
Edit and Run Code