Get startedGet started for free

Using a DataFrame

In the previous exercise, you saw how to split up a task and use the low-level python multiprocessing.Pool API to do calculations on several processing units.

It's essential to understand this on a lower level, but in reality, you'll never use this kind of APIs. A more convenient way to parallelize an apply over several groups is using the dask framework and its abstraction of the pandas DataFrame, for example.

The pandas DataFrame, athlete_events, is available in your workspace.

This exercise is part of the course

Introduction to Data Engineering

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

import dask.dataframe as dd

# Set the number of partitions
athlete_events_dask = dd.from_pandas(athlete_events, ____=____)
Edit and Run Code