Chaining operations
Now that you have loaded and cleaned the data, you can begin analyzing it. Your first task is to look at the birth dates of the politicians. The birth dates are in string format like 'YYYY-MM-DD'
. The first 4 characters in the string are the year.
The filtered Dask bag you created in the last exercise, filtered_bag
, is available in your environment.
Diese Übung ist Teil des Kurses
Parallel Programming with Dask in Python
Anleitung zur Übung
- Use the bag's
.pluck()
method to extract'birth_date'
strings. - Write a lambda function to extract the year string from the
'birth_date'
strings and convert it into an integer. - Use the new bag
birth_year_bag
to calculate the min, max, and mean birth years. - Use the
dask.compute()
function to calculate the three aggregates efficiently.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Select the 'birth_date' from each dictionary in the bag
birth_date_bag = filtered_bag.____
# Extract the year as an integer from the birth_date strings
birth_year_bag = birth_date_bag.____(lambda x: ____)
# Calculate the min, max and mean birth years
min_year = ____
max_year = ____
mean_year = ____
# Compute the results efficiently and print them
print(____(____))