Read Dask DataFrames from Parquet
In Chapter 1, you analyzed some Spotify data, which was split across multiple files to find the top hits of 2005-2020. You did this using the dask.delayed()
function and a loop. Let's see how much easier this analysis becomes using Dask DataFrames.
dask.dataframe
has been imported for you as dd
.
Diese Übung ist Teil des Kurses
Parallel Programming with Dask in Python
Anleitung zur Übung
- Load the Parquet data folder located in
"data/spotify_parquet"
. - Use the DataFrame's
.nlargest()
method to find the top 10 songs by'popularity'
. - Convert the delayed object into a pandas DataFrame by computing it.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Read the spotify_parquet folder
df = ____
# Find the 10 most popular songs
top_10_songs = ____
# Convert the delayed result to a pandas DataFrame
top_10_songs_df = ____
print(top_10_songs_df)