Performing a semi join

Some of the tracks that have generated the most significant amount of revenue are from TV-shows or are other non-musical audio. You have been given a table of invoices that include top revenue-generating items. Additionally, you have a table of non-musical tracks from the streaming service. In this exercise, you'll use a semi join to find the top revenue-generating non-musical tracks.

The tables non_mus_tcks, top_invoices, and genres have been loaded for you.

Deze oefening maakt deel uit van de cursus

Joining Data with pandas

Cursus bekijken

Oefeninstructies

Merge non_mus_tcks and top_invoices on tid using an inner join. Save the result as tracks_invoices.
Use .isin() to subset the rows of non_mus_tcks where tid is in the tid column of tracks_invoices. Save the result as top_tracks.
Group top_tracks by gid and count the tid rows. Save the result to cnt_by_gid.
Merge cnt_by_gid with the genres table on gid and print the result.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Merge the non_mus_tcks and top_invoices tables on tid
tracks_invoices = ____.merge(____)

# Use .isin() to subset non_mus_tcks to rows with tid in tracks_invoices
top_tracks = _____[non_mus_tcks['tid'].isin(____)]

# Group the top_tracks by gid and count the tid rows
cnt_by_gid = top_tracks.groupby(['gid'], as_index=False).agg({'tid':____})

# Merge the genres table to cnt_by_gid on gid and print
print(____)

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Joining Data with pandas

SkillTag.level.intermediateSkillTag.label

4.8+

Begin de cursus gratis

Learn how you can merge disparate data using inner joins. By combining information from multiple sources you’ll uncover compelling insights that may have previously been hidden. You’ll also learn how the relationship between those sources, such as one-to-one or one-to-many, can affect your result.

Exercise 1: Inner join Exercise 2: What column to merge on?Exercise 3: Your first inner join Exercise 4: Inner joins and number of rows returned Exercise 5: One-to-many relationships Exercise 6: One-to-many classification Exercise 7: One-to-many merge Exercise 8: Merging multiple DataFrames Exercise 9: Total riders in a month Exercise 10: Three table merge Exercise 11: One-to-many merge with multiple tables

Take your knowledge of joins to the next level. In this chapter, you’ll work with TMDb movie data as you learn about left, right, and outer joins. You’ll also discover how to merge a table to itself and merge on a DataFrame index.

Exercise 1: Left join Exercise 2: Counting missing rows with left join Exercise 3: Enriching a dataset Exercise 4: How many rows with a left join?Exercise 5: Other joins Exercise 6: Right join to find unique movies Exercise 7: Popular genres with right join Exercise 8: Using outer join to select actors Exercise 9: Merging a table to itself Exercise 10: Self join Exercise 11: How does pandas handle self joins?Exercise 12: Merging on indexes Exercise 13: Index merge for movie ratings Exercise 14: Do sequels earn more?

In this chapter, you’ll leverage powerful filtering techniques, including semi-joins and anti-joins. You’ll also learn how to glue DataFrames by vertically combining and using the pandas.concat function to create new datasets. Finally, because data is rarely clean, you’ll also learn how to validate your newly combined data structures.

Exercise 1: Filtering joins Exercise 2: Steps of a semi join Exercise 3: Performing an anti join Exercise 4: Performing a semi join

Huidige oefening

Exercise 5: Concatenate DataFrames together vertically Exercise 6: Concatenation basics Exercise 7: Concatenating with keys Exercise 8: Verifying integrity Exercise 9: Validating a merge Exercise 10: Concatenate and merge to find common songs

In this final chapter, you’ll step up a gear and learn to apply pandas' specialized methods for merging time-series and ordered data together with real-world financial and economic data from the city of Chicago. You’ll also learn how to query resulting tables using a SQL-style format, and unpivot data using the melt method.

Exercise 1: Using merge_ordered()Exercise 2: Correlation between GDP and S&P500 Exercise 3: Phillips curve using merge_ordered()Exercise 4: merge_ordered() caution, multiple columns Exercise 5: Using merge_asof()Exercise 6: Using merge_asof() to study stocks Exercise 7: Using merge_asof() to create dataset Exercise 8: merge_asof() and merge_ordered() differences Exercise 9: Selecting data with .query()Exercise 10: Explore financials with .query()Exercise 11: Subsetting rows with .query()Exercise 12: Reshaping data with .melt()Exercise 13: Select the right .melt() arguments Exercise 14: Using .melt() to reshape government data Exercise 15: Using .melt() for stocks vs bond performance Exercise 16: Course wrap-up