BaşlayınÜcretsiz Başlayın

Read Dask DataFrames from Parquet

In Chapter 1, you analyzed some Spotify data, which was split across multiple files to find the top hits of 2005-2020. You did this using the dask.delayed() function and a loop. Let's see how much easier this analysis becomes using Dask DataFrames.

dask.dataframe has been imported for you as dd.

Bu egzersiz

Parallel Programming with Dask in Python

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Load the Parquet data folder located in "data/spotify_parquet".
  • Use the DataFrame's .nlargest() method to find the top 10 songs by 'popularity'.
  • Convert the delayed object into a pandas DataFrame by computing it.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Read the spotify_parquet folder
df = ____

# Find the 10 most popular songs
top_10_songs = ____

# Convert the delayed result to a pandas DataFrame
top_10_songs_df = ____

print(top_10_songs_df)
Kodu Düzenle ve Çalıştır