LoslegenKostenlos loslegen

Most popular songs

You have one more task on this Spotify data, which is to find the top 10 most popular songs across all available years. The algorithm you will need to use to compute this is to calculate the top 10 songs in each year, and then combine these and find the top 10 of the top 10s.

The following function, which finds the top 10 songs in a DataFrame, has been provided for you and is available in your environment.

def top_10_most_popular(df):
  return df.nlargest(n=10, columns='popularity')

dask and the delayed() function have been imported for you. pandas has been imported as pd. The list of filenames is available in your environment as filenames, and the year of each file is stored in the list years.

Diese Übung ist Teil des Kurses

Parallel Programming with Dask in Python

Kurs anzeigen

Anleitung zur Übung

  • For each file, find the top 10 songs in that year using the top_10_most_popular() function.
  • Compute the list of top 10s from each year, and select the first item of the resulting tuple.
  • Run the top_10_most_popular() function to find the top 10 songs over the concatenated DataFrame.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

top_songs = []

for file in filenames:
    df = delayed(pd.read_csv)(file)
    # Find the top 10 most popular songs in this file
    df_top_10 = ____
    top_songs.append(df_top_10)

# Compute the list of top 10s
top_songs_list = ____

# Concatenate them and find the best of the best
top_songs_df = pd.concat(top_songs_list)
df_all_time_top_10 = ____
print(df_all_time_top_10)
Code bearbeiten und ausführen