Get startedGet started for free

Combining common data with inner joins

1. Combining common data with inner joins

Now that you've worked with left and right joins, let's talk about inner joins. Inner joins are useful when we want to preserve only the data covered by both datasets. This can be the case in machine learning when it's desirable to avoid missing data in the combined data frame.

2. Object-oriented expressions

Before we start with inner joins though, let's talk about a bit of pandas shorthand. There's an alternative way to write merge statements that's just as intuitive and slightly more consistent with other Python expressions. We can write the statement using the form of "data frame left, dot-merge, data frame right". The result from this expression is the same as from the other format because pandas uses the same code behind the scenes. We'll use this new object-oriented notation whenever possible for the rest of this course. In practice, it generally doesn't matter and you can use whichever way works best.

3. A basic inner join

Now, back to inner joins. Here's an example of a basic inner join. The table on the left has rows with even Game Keys between 2 and 8. The table on the right has Game Keys 3 through 6. There are several steps needed in a spreadsheet environment to get only the games seen in both tables. You would first create v-lookup columns in one table and reference the other table as in a left or right join. Then you would remove rows showing NA to get only the rows present in both tables. In pandas we simply apply the how='inner' argument to our merge statement. This applies the same logic and returns a data frame with common rows and all columns.

4. Joining on different names

Sometimes we want to join tables on key columns that represent the same feature but have different names. In this example you can see that 'game key' is capitalized and punctuated differently in the tables. The merge method can handle this difference without the need to rename columns. Instead of the 'on' parameter, we simply use the 'left-on' and ‘right-on’ parameters to match the joining column for both tables. Pandas will merge the tables as if the names were identical. By the way, the resulting data frame will use the column name from the left-hand table.

5. Let's practice!

We've covered a lot of ground. Now it's time to finish the chapter with one more practice session.