MulaiMulai sekarang secara gratis

Using a DataFrame

In the previous exercise, you saw how to split up a task and use the low-level python multiprocessing.Pool API to do calculations on several processing units.

It's essential to understand this on a lower level, but in reality, you'll never use this kind of APIs. A more convenient way to parallelize an apply over several groups is using the dask framework and its abstraction of the pandas DataFrame, for example.

The pandas DataFrame, athlete_events, is available in your workspace.

Latihan ini adalah bagian dari kursus

Introduction to Data Engineering

Lihat Kursus

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

import dask.dataframe as dd

# Set the number of partitions
athlete_events_dask = dd.from_pandas(athlete_events, ____=____)
Edit dan Jalankan Kode