Joining DataFrames
1. Joining DataFrames
Now let's learn how to combine DataFrames using joins.2. Reviews data
So far, we have review scores and prices,3. The inspections dataset
But we also need to combine this data with hygiene ratings. We can find matching rows using a join. But what is a join?4. What is a join?
A join combines two DataFrames by finding matching rows between them. Let's say we have this DataFrame of World Cup winners. We call it the left DataFrame.5. What is a join?
And we want to combine it with this DataFrame to get the region of each country. We call this the right DataFrame. A join identifies6. What is a join?
where we have matching rows, like here for Brazil.7. What is a join?
And creates a new DataFrame.8. What is a join?
where the region column has the right data for each row.9. Join on a single column
Now we want to combine the inspections and reviews DataFrames. We start with the .join() method on our reviews DataFrame.10. Join on a single column
Inside join, we pass the inspections DataFrame.11. Join on a single column
And set on to "business" to match rows with the same business name.12. Join on a single column
We set the how argument to inner, which is also the default. An inner join only keeps rows that appear in both DataFrames. Bang Bang Burger is excluded because it doesn't appear in the inspections DataFrame. But we have a problem. There are 2 Costa Coffee locations. Therefore, we get 2 incorrect rows with the review of one Costa location, but the inspection score of another.13. Join on multiple columns
The business column alone doesn't identify matches uniquely. To fix this, we join on both business and location by passing a list of columns.14. Join on multiple columns
Now each restaurant matches exactly one inspection.15. Left join
For our app, we want to keep all restaurants with reviews, even if some don't have inspection data yet. For this, we need a left join that keeps every row from the left DataFrame and fills with null when there is no match on the right.16. Left join
So we set how to left. Now Bang Bang Burger appears with null for type and hygiene_rating since it has no inspection match.17. Full join
If we want to keep any restaurant that has a review or an inspection, we use a full join. If a row from either side has no match, the missing columns are filled with null.18. Full join
So we set how to full.19. Full join
By default, the output would have the join columns - business and location - repeated from each DataFrame. We set coalesce to True so we get a single column for each. Wagamama from the inspections DataFrame now appears with null for review and price, and Bang Bang Burger has null for the inspection columns.20. Choosing a join strategy
To summarize:21. Inner join
Use inner when you only want matching rows.22. Left join
Left when you want to keep everything from your left DataFrame.23. Full join
And full when you want to keep everything from both DataFrames.24. Let's practice!
Time to practice joining DataFrames!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.