Get startedGet started for free

A closer look at one-to-one joins

1. A closer look at one-to-one joins

In this video you'll learn how to create one-to-one joins with pandas. Let's get right to it.

2. Basics of pandas.merge()

Earlier in the course, we discussed complementary datasets where the columns hold different features and the rows describe either the same or different items. Joining the datasets is somewhat like joining two interlocking pieces of wood. In pandas you can join the data using the pandas-merge function. This versatile function merges two data frames based on joining one or more key columns. Unlike the pandas-concat function, the key column values do not have to match exactly. As mentioned before, pandas merge works in a way similar to VLOOKUP for spreadsheets. Let's take a look.

3. Left merges

In this example we have two tables, both based on Game Key. You can perform a left merge and get the same result as a v-lookup formula. Inside the pandas merge function, list the names of both data frames, with the left data frame as the one you want to pull values into. Then specify ‘how = left’ in the merge statement. Just as with v-lookup, the resulting data frame will have the same rows as the left-hand data frame. it will have columns from both tables.

4. Right merges

We can instead specify how=right and perform a right merge. The concept here is the same as with the left merge. The difference is now the resulting data frame has the same row structure as the right-hand data frame instead of the left one. As before, it includes columns from both tables.

5. A framework for joins

Before going further, let's establish a guiding framework for joins. This framework will help us focus as we progress through different joins. Here you can see 3 steps to follow. The three steps are: one Determine the relationship two, Check for unique key columns and three, Write and execute the merge statement Note that writing the code and executing the merge is the last step.

6. Joining on two keys

Let's look at the framework in the context of joining data based on two key columns. Here you see two tables. Let's see how to best join them. First, we determine the relationship between the two tables. Both tables have GameId and PlayId columns followed by other data. These tables appear to be at the same level, organized by Games and/or plays. Also, the tables have complementary data for the same games and plays. This looks like a one-to-one, vlookup-style join. Next lets check uniqueness of the key columns. You can see that games are not unique, nor are plays. But, combinations of games and plays appear to be unique. These two columns together are the keys. Now it's time to write the merge statement. It's pretty straightforward. The only change from before is that when joining on two or more keys, you need to put the column names in a bracketed list. That's it!

7. Let's practice!

Speaking of merges, let's merge your new skills with some exercises!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.