Chaining operations
Now that you have loaded and cleaned the data, you can begin analyzing it. Your first task is to look at the birth dates of the politicians. The birth dates are in string format like 'YYYY-MM-DD'
. The first 4 characters in the string are the year.
The filtered Dask bag you created in the last exercise, filtered_bag
, is available in your environment.
This exercise is part of the course
Parallel Programming with Dask in Python
Exercise instructions
- Use the bag's
.pluck()
method to extract'birth_date'
strings. - Write a lambda function to extract the year string from the
'birth_date'
strings and convert it into an integer. - Use the new bag
birth_year_bag
to calculate the min, max, and mean birth years. - Use the
dask.compute()
function to calculate the three aggregates efficiently.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Select the 'birth_date' from each dictionary in the bag
birth_date_bag = filtered_bag.____
# Extract the year as an integer from the birth_date strings
birth_year_bag = birth_date_bag.____(lambda x: ____)
# Calculate the min, max and mean birth years
min_year = ____
max_year = ____
mean_year = ____
# Compute the results efficiently and print them
print(____(____))