1. Learn
  2. /
  3. Courses
  4. /
  5. Parallel Programming with Dask in Python

Connected

Exercise

Dask arrays from HDF5 datasets

You have been tasked with analyzing European rainfall over the last 40 years. The monthly average rainfall in a grid of locations over Europe has been provided for you in HDF5 format. Since this file is pretty large, you decide to load and process it using Dask.

h5py has been imported for you, and dask.array has been imported as da.

Instructions

100 XP
  • Open the 'data/era_eu.hdf5' file using h5py.
  • Load the '/precip' variable into a Dask array using the from_array() function, and set chunks of (12 months, 15 latitudes, and 15 longitudes).
  • Use array slicing to select every 12th index along the first axis - this selects the January data from all years.
  • Take the mean of january_rainfalls along the time axis (axis 0) to calculate the mean rainfall in January across Europe.