CommencerCommencer gratuitement

Dask arrays from HDF5 datasets

You have been tasked with analyzing European rainfall over the last 40 years. The monthly average rainfall in a grid of locations over Europe has been provided for you in HDF5 format. Since this file is pretty large, you decide to load and process it using Dask.

h5py has been imported for you, and dask.array has been imported as da.

Cet exercice fait partie du cours

Parallel Programming with Dask in Python

Afficher le cours

Instructions

  • Open the 'data/era_eu.hdf5' file using h5py.
  • Load the '/precip' variable into a Dask array using the from_array() function, and set chunks of (12 months, 15 latitudes, and 15 longitudes).
  • Use array slicing to select every 12th index along the first axis - this selects the January data from all years.
  • Take the mean of january_rainfalls along the time axis (axis 0) to calculate the mean rainfall in January across Europe.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Open the HDF5 dataset using h5py
hdf5_file = ____.____(____)

# Load the file into a Dask array with a reasonable chunk size
precip = da.____(____, chunks=____)

# Select only the months of January
january_rainfalls = ____[____]

# Calculate the mean rainfall in January for each location
january_mean_rainfall = ____.____(axis=____)

plt.imshow(january_mean_rainfall.compute())
plt.show()
Modifier et exécuter le code