Read Dask DataFrames from Parquet
In Chapter 1, you analyzed some Spotify data, which was split across multiple files to find the top hits of 2005-2020. You did this using the dask.delayed() function and a loop. Let's see how much easier this analysis becomes using Dask DataFrames.
dask.dataframe has been imported for you as dd.
Bu egzersiz
Parallel Programming with Dask in Python
kursunun bir parçasıdırEgzersiz talimatları
- Load the Parquet data folder located in
"data/spotify_parquet". - Use the DataFrame's
.nlargest()method to find the top 10 songs by'popularity'. - Convert the delayed object into a pandas DataFrame by computing it.
Uygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# Read the spotify_parquet folder
df = ____
# Find the 10 most popular songs
top_10_songs = ____
# Convert the delayed result to a pandas DataFrame
top_10_songs_df = ____
print(top_10_songs_df)