Most popular songs
You have one more task on this Spotify data, which is to find the top 10 most popular songs across all available years. The algorithm you will need to use to compute this is to calculate the top 10 songs in each year, and then combine these and find the top 10 of the top 10s.
The following function, which finds the top 10 songs in a DataFrame, has been provided for you and is available in your environment.
def top_10_most_popular(df):
return df.nlargest(n=10, columns='popularity')
dask
and the delayed()
function have been imported for you. pandas
has been imported as pd
. The list of filenames is available in your environment as filenames
, and the year of each file is stored in the list years
.
Diese Übung ist Teil des Kurses
Parallel Programming with Dask in Python
Anleitung zur Übung
- For each file, find the top 10 songs in that year using the
top_10_most_popular()
function. - Compute the list of top 10s from each year, and select the first item of the resulting tuple.
- Run the
top_10_most_popular()
function to find the top 10 songs over the concatenated DataFrame.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
top_songs = []
for file in filenames:
df = delayed(pd.read_csv)(file)
# Find the top 10 most popular songs in this file
df_top_10 = ____
top_songs.append(df_top_10)
# Compute the list of top 10s
top_songs_list = ____
# Concatenate them and find the best of the best
top_songs_df = pd.concat(top_songs_list)
df_all_time_top_10 = ____
print(df_all_time_top_10)