Get startedGet started for free

Chaining operations

Now that you have loaded and cleaned the data, you can begin analyzing it. Your first task is to look at the birth dates of the politicians. The birth dates are in string format like 'YYYY-MM-DD'. The first 4 characters in the string are the year.

The filtered Dask bag you created in the last exercise, filtered_bag, is available in your environment.

This exercise is part of the course

Parallel Programming with Dask in Python

View Course

Exercise instructions

  • Use the bag's .pluck() method to extract 'birth_date' strings.
  • Write a lambda function to extract the year string from the 'birth_date' strings and convert it into an integer.
  • Use the new bag birth_year_bag to calculate the min, max, and mean birth years.
  • Use the dask.compute() function to calculate the three aggregates efficiently.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Select the 'birth_date' from each dictionary in the bag
birth_date_bag = filtered_bag.____

# Extract the year as an integer from the birth_date strings
birth_year_bag = birth_date_bag.____(lambda x: ____)

# Calculate the min, max and mean birth years
min_year = ____
max_year = ____
mean_year = ____

# Compute the results efficiently and print them
print(____(____))
Edit and Run Code