Most popular songs
You have one more task on this Spotify data, which is to find the top 10 most popular songs across all available years. The algorithm you will need to use to compute this is to calculate the top 10 songs in each year, and then combine these and find the top 10 of the top 10s.
The following function, which finds the top 10 songs in a DataFrame, has been provided for you and is available in your environment.
def top_10_most_popular(df):
return df.nlargest(n=10, columns='popularity')
dask
and the delayed()
function have been imported for you. pandas
has been imported as pd
. The list of filenames is available in your environment as filenames
, and the year of each file is stored in the list years
.
This exercise is part of the course
Parallel Programming with Dask in Python
Exercise instructions
- For each file, find the top 10 songs in that year using the
top_10_most_popular()
function. - Compute the list of top 10s from each year, and select the first item of the resulting tuple.
- Run the
top_10_most_popular()
function to find the top 10 songs over the concatenated DataFrame.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
top_songs = []
for file in filenames:
df = delayed(pd.read_csv)(file)
# Find the top 10 most popular songs in this file
df_top_10 = ____
top_songs.append(df_top_10)
# Compute the list of top 10s
top_songs_list = ____
# Concatenate them and find the best of the best
top_songs_df = pd.concat(top_songs_list)
df_all_time_top_10 = ____
print(df_all_time_top_10)