Dask arrays from HDF5 datasets
You have been tasked with analyzing European rainfall over the last 40 years. The monthly average rainfall in a grid of locations over Europe has been provided for you in HDF5 format. Since this file is pretty large, you decide to load and process it using Dask.
h5py
has been imported for you, and dask.array
has been imported as da
.
This exercise is part of the course
Parallel Programming with Dask in Python
Exercise instructions
- Open the
'data/era_eu.hdf5'
file usingh5py
. - Load the
'/precip'
variable into a Dask array using thefrom_array()
function, and set chunks of (12 months, 15 latitudes, and 15 longitudes). - Use array slicing to select every 12th index along the first axis - this selects the January data from all years.
- Take the mean of
january_rainfalls
along the time axis (axis0
) to calculate the mean rainfall in January across Europe.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Open the HDF5 dataset using h5py
hdf5_file = ____.____(____)
# Load the file into a Dask array with a reasonable chunk size
precip = da.____(____, chunks=____)
# Select only the months of January
january_rainfalls = ____[____]
# Calculate the mean rainfall in January for each location
january_mean_rainfall = ____.____(axis=____)
plt.imshow(january_mean_rainfall.compute())
plt.show()