Performing a semi join
Some of the tracks that have generated the most significant amount of revenue are from TV-shows or are other non-musical audio. You have been given a table of invoices that include top revenue-generating items. Additionally, you have a table of non-musical tracks from the streaming service. In this exercise, you'll use a semi join to find the top revenue-generating non-musical tracks.
The tables non_mus_tcks, top_invoices, and genres have been loaded for you.
This exercise is part of the course
Joining Data with pandas
Exercise instructions
- Merge
non_mus_tcksandtop_invoicesontidusing an inner join. Save the result astracks_invoices. - Use
.isin()to subset the rows ofnon_mus_tckswheretidis in thetidcolumn oftracks_invoices. Save the result astop_tracks. - Group
top_tracksbygidand count thetidrows. Save the result tocnt_by_gid. - Merge
cnt_by_gidwith thegenrestable ongidand print the result.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Merge the non_mus_tcks and top_invoices tables on tid
tracks_invoices = ____.merge(____)
# Use .isin() to subset non_mus_tcks to rows with tid in tracks_invoices
top_tracks = _____[non_mus_tcks['tid'].isin(____)]
# Group the top_tracks by gid and count the tid rows
cnt_by_gid = top_tracks.groupby(['gid'], as_index=False).agg({'tid':____})
# Merge the genres table to cnt_by_gid on gid and print
print(____)