Merging: The VLOOKUP of Python
1. Merging: The VLOOKUP of Python
For the final lesson in this chapter, we will combine the data from separate DataFrames into one dataset.2. VLOOKUP
Let's start with what we know. Imagine columns A and B are from our fruit colors DataFrame, and columns E and F are from our fruit prices DataFrame.3. VLOOKUP - the key column
Both tables share a common column, name, which enables these tables to be combined. This common column is known as the key column in data science. Tables need key columns to be joined together. Note that it doesn't matter that the Apple row, row 2, is the second row of both tables. What does matter, though, is that Apple is present in the name column of both tables. This is what ensures we can merge Apples with Apples.4. VLOOKUP - A left join
The specific kind of joining we're doing here is referred to as a left join. That's because we have a table on our left, and we are attaching the right-side table to it. You can think of the left table, the color table, as the destination for the right table. The right table will attach to the left table wherever it can find a match in the key column. While there are a few different flavors of joins, we will focus on left joins in this course, as they intuitively function like VLOOKUPs.5. VLOOKUP - #N/A
And while we won't explore cases where there are values in your left table not present in the right table, it's worth acknowledging that Python will react intuitively to a scenario like this, where we are missing a Kiwi entry in our right table. A spreadsheet hands us hashtag-N-A and Python will hand us a similar kind of N-A. So here are two6. The .merge() method
DataFrames in Python, fruit-underscore-colors and fruit-underscore-prices.7. The .merge() method
We'll join them together with the dot-merge method. The first argument we pass to the dot-merge method is the name of the DataFrame with which we wish to merge, here fruit-underscore-prices. The second argument, on, should equal the column name of the key column shared between the two tables. The key column in our dataset is the name column, so we specify 'name' in quotations. The last argument, how, specifies what type of join we'll be utilizing. As mentioned before, we'll only be working with left joins in this course, so we specify 'left' in quotations. And in8. Key columns without the same name
the event your key column names are not called the same thing between tables, like here, where our key columns are name and fruit-underscore-type, respectively, we can use some alternative arguments in dot-merge to help us out.9. left_on and right_on in .merge()
Here, instead of using the on argument in dot-merge, we use left-underscore-on and right-underscore-on to instruct Python on which uniquely-named columns from each table are to be used as the key columns. Also note how in our result, both the left and right keys are preserved. We can drop one of these accordingly with the dot-drop method as needed.10. Your turn!
Now it's your turn to perform some left joins in the exercises.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.