Get startedGet started for free

Most popular songs

You have one more task on this Spotify data, which is to find the top 10 most popular songs across all available years. The algorithm you will need to use to compute this is to calculate the top 10 songs in each year, and then combine these and find the top 10 of the top 10s.

The following function, which finds the top 10 songs in a DataFrame, has been provided for you and is available in your environment.

def top_10_most_popular(df):
  return df.nlargest(n=10, columns='popularity')

dask and the delayed() function have been imported for you. pandas has been imported as pd. The list of filenames is available in your environment as filenames, and the year of each file is stored in the list years.

This exercise is part of the course

Parallel Programming with Dask in Python

View Course

Exercise instructions

  • For each file, find the top 10 songs in that year using the top_10_most_popular() function.
  • Compute the list of top 10s from each year, and select the first item of the resulting tuple.
  • Run the top_10_most_popular() function to find the top 10 songs over the concatenated DataFrame.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

top_songs = []

for file in filenames:
    df = delayed(pd.read_csv)(file)
    # Find the top 10 most popular songs in this file
    df_top_10 = ____
    top_songs.append(df_top_10)

# Compute the list of top 10s
top_songs_list = ____

# Concatenate them and find the best of the best
top_songs_df = pd.concat(top_songs_list)
df_all_time_top_10 = ____
print(df_all_time_top_10)
Edit and Run Code