Get startedGet started for free

Joining DataFrames

1. Joining DataFrames

Now let's learn how to combine DataFrames using joins.

2. Reviews data

So far, we have review scores and prices,

3. The inspections dataset

But we also need to combine this data with hygiene ratings. We can find matching rows using a join. But what is a join?

4. What is a join?

A join combines two DataFrames by finding matching rows between them. Let's say we have this DataFrame of World Cup winners. We call it the left DataFrame.

5. What is a join?

And we want to combine it with this DataFrame to get the region of each country. We call this the right DataFrame. A join identifies

6. What is a join?

where we have matching rows, like here for Brazil.

7. What is a join?

And creates a new DataFrame.

8. What is a join?

where the region column has the right data for each row.

9. Join on a single column

Now we want to combine the inspections and reviews DataFrames. We start with the .join() method on our reviews DataFrame.

10. Join on a single column

Inside join, we pass the inspections DataFrame.

11. Join on a single column

And set on to "business" to match rows with the same business name.

12. Join on a single column

We set the how argument to inner, which is also the default. An inner join only keeps rows that appear in both DataFrames. Bang Bang Burger is excluded because it doesn't appear in the inspections DataFrame. But we have a problem. There are 2 Costa Coffee locations. Therefore, we get 2 incorrect rows with the review of one Costa location, but the inspection score of another.

13. Join on multiple columns

The business column alone doesn't identify matches uniquely. To fix this, we join on both business and location by passing a list of columns.

14. Join on multiple columns

Now each restaurant matches exactly one inspection.

15. Left join

For our app, we want to keep all restaurants with reviews, even if some don't have inspection data yet. For this, we need a left join that keeps every row from the left DataFrame and fills with null when there is no match on the right.

16. Left join

So we set how to left. Now Bang Bang Burger appears with null for type and hygiene_rating since it has no inspection match.

17. Full join

If we want to keep any restaurant that has a review or an inspection, we use a full join. If a row from either side has no match, the missing columns are filled with null.

18. Full join

So we set how to full.

19. Full join

By default, the output would have the join columns - business and location - repeated from each DataFrame. We set coalesce to True so we get a single column for each. Wagamama from the inspections DataFrame now appears with null for review and price, and Bang Bang Burger has null for the inspection columns.

20. Choosing a join strategy

To summarize:

21. Inner join

Use inner when you only want matching rows.

22. Left join

Left when you want to keep everything from your left DataFrame.

23. Full join

And full when you want to keep everything from both DataFrames.

24. Let's practice!

Time to practice joining DataFrames!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.