Dask arrays from HDF5 datasets
You have been tasked with analyzing European rainfall over the last 40 years. The monthly average rainfall in a grid of locations over Europe has been provided for you in HDF5 format. Since this file is pretty large, you decide to load and process it using Dask.
h5py
has been imported for you, and dask.array
has been imported as da
.
Diese Übung ist Teil des Kurses
Parallel Programming with Dask in Python
Anleitung zur Übung
- Open the
'data/era_eu.hdf5'
file usingh5py
. - Load the
'/precip'
variable into a Dask array using thefrom_array()
function, and set chunks of (12 months, 15 latitudes, and 15 longitudes). - Use array slicing to select every 12th index along the first axis - this selects the January data from all years.
- Take the mean of
january_rainfalls
along the time axis (axis0
) to calculate the mean rainfall in January across Europe.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Open the HDF5 dataset using h5py
hdf5_file = ____.____(____)
# Load the file into a Dask array with a reasonable chunk size
precip = da.____(____, chunks=____)
# Select only the months of January
january_rainfalls = ____[____]
# Calculate the mean rainfall in January for each location
january_mean_rainfall = ____.____(axis=____)
plt.imshow(january_mean_rainfall.compute())
plt.show()