Performing a semi join
Some of the tracks that have generated the most significant amount of revenue are from TV-shows or are other non-musical audio. You have been given a table of invoices that include top revenue-generating items. Additionally, you have a table of non-musical tracks from the streaming service. In this exercise, you'll use a semi join to find the top revenue-generating non-musical tracks.
The tables non_mus_tcks
, top_invoices
, and genres
have been loaded for you.
This exercise is part of the course
Joining Data with pandas
Exercise instructions
- Merge
non_mus_tcks
andtop_invoices
ontid
using an inner join. Save the result astracks_invoices
. - Use
.isin()
to subset the rows ofnon_mus_tcks
wheretid
is in thetid
column oftracks_invoices
. Save the result astop_tracks
. - Group
top_tracks
bygid
and count thetid
rows. Save the result tocnt_by_gid
. - Merge
cnt_by_gid
with thegenres
table ongid
and print the result.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Merge the non_mus_tcks and top_invoices tables on tid
tracks_invoices = ____.merge(____)
# Use .isin() to subset non_mus_tcks to rows with tid in tracks_invoices
top_tracks = _____[non_mus_tcks['tid'].isin(____)]
# Group the top_tracks by gid and count the tid rows
cnt_by_gid = top_tracks.groupby(['gid'], as_index=False).agg({'tid':____})
# Merge the genres table to cnt_by_gid on gid and print
print(____)