Using a DataFrame
In the previous exercise, you saw how to split up a task and use the low-level python multiprocessing.Pool API to do calculations on several processing units.
It's essential to understand this on a lower level, but in reality, you'll never use this kind of APIs. A more convenient way to parallelize an apply over several groups is using the dask framework and its abstraction of the pandas DataFrame, for example.
The pandas DataFrame, athlete_events, is available in your workspace.
Deze oefening maakt deel uit van de cursus
Introduction to Data Engineering
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
import dask.dataframe as dd
# Set the number of partitions
athlete_events_dask = dd.from_pandas(athlete_events, ____=____)