CommencerCommencer gratuitement

Most popular songs

You have one more task on this Spotify data, which is to find the top 10 most popular songs across all available years. The algorithm you will need to use to compute this is to calculate the top 10 songs in each year, and then combine these and find the top 10 of the top 10s.

The following function, which finds the top 10 songs in a DataFrame, has been provided for you and is available in your environment.

def top_10_most_popular(df):
  return df.nlargest(n=10, columns='popularity')

dask and the delayed() function have been imported for you. pandas has been imported as pd. The list of filenames is available in your environment as filenames, and the year of each file is stored in the list years.

Cet exercice fait partie du cours

Parallel Programming with Dask in Python

Afficher le cours

Instructions

  • For each file, find the top 10 songs in that year using the top_10_most_popular() function.
  • Compute the list of top 10s from each year, and select the first item of the resulting tuple.
  • Run the top_10_most_popular() function to find the top 10 songs over the concatenated DataFrame.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

top_songs = []

for file in filenames:
    df = delayed(pd.read_csv)(file)
    # Find the top 10 most popular songs in this file
    df_top_10 = ____
    top_songs.append(df_top_10)

# Compute the list of top 10s
top_songs_list = ____

# Concatenate them and find the best of the best
top_songs_df = pd.concat(top_songs_list)
df_all_time_top_10 = ____
print(df_all_time_top_10)
Modifier et exécuter le code