Get startedGet started for free

Preparing to put tables together

1. Preparing to put tables together

Now that we have multiple related tables, the next logical step would be to piece them together.

2. Table merging

Merging tables, which we will explicitly cover in the next lesson, involves taking two tables with common fields,

3. Table merging

here our name columns, and merging them together to create a more holistic dataset.

4. Merging: The VLOOKUP of Data Science

The closest spreadsheets comparison to a table merge, or join, as it's often called, is a VLOOKUP. To review,

5. Merging: The VLOOKUP of Data Science

a VLOOKUP will take a value, here Apple in cell A2, and compare it to another table,

6. Merging: The VLOOKUP of Data Science

here cells E2 through F9. The VLOOKUP finds the rows where Apple is located in the right-hand table, and outputs a corresponding value in the left-hand table, here price. Merging, or joining, in Python works a little differently, and so in this lesson we'll cover preparatory techniques for handling these differences.

7. Merging in Python

First, Python is case-sensitive, so capital-A Apple is not the same as lowercase-a apple. The corresponding columns must be exact matches, otherwise the join won't work as expected.

8. Merging in Python

Second, it's common to make sure leading and trailing spaces don't negatively affect our join as well.

9. Merging in Python

Finally, Python joins entire tables, so if there's a column or two we don't want in the final dataset, we must remove them ourselves.

10. Handling case-sensitivity

Imagine if the price data we parsed had fruit names in all caps, and the color data we parsed had name in title case, where the first letter of each word is capitalized. Because Python is case-sensitive, merging on name would not work. To correct,

11. .str.title() method

we can change the case of a column with the dot-str-dot-title method. As we've seen previously when filtering, we can access an existing column by placing the column name in quotes within brackets next to the name of our DataFrame. Then we place the dot-str-dot-title method to alter the name column's entries to be title case instead of uppercase. To save this change, we simply assign

12. .str.title() method

the existing column to what we've written in the previous slide.

13. .str.upper() method

Similarly, the dot-str-dot-upper method turns all column entries into uppercase characters and the

14. .str.lower() method

dot-str-dot-lower method turns all column entries into lowercase characters. For the name

15. .str.title() method

column, we'll stick with title case. This one

16. Handling whitespace with .str.strip()

may be hard to see, but the dot-str-dot-strip method will remove any leading and trailing spaces in our entries. Just like we use dot-str-dot-upper or dot-str-dot-lower, we use dot-str-dot-strip. This is a useful precautionary method for real world data.

17. Selecting and dropping columns

Sometimes we have unnecessary columns that we won't need when we merge tables, like here, with our unneeded columns, full of useless entries. We'll look at two approaches for removal.

18. Selecting columns

First, selecting columns is as easy as inserting a list of column names within a set of brackets next to the variable. Note there are two sets of brackets here, and selecting keeps the desired columns. Next

19. Selecting columns

we assign our leaner DataFrame back to fruit-underscore-price. Alternatively, a DataFrame's dot-drop

20. Dropping Columns with .drop()

method can be used to delete columns by passing a column name or a list of column names, along with the argument axis-equals-1. The axis argument tells Python to delete columns. I like to remember this with the idea that columns, like 1's, go up and down vertically.

21. Dropping Columns

Again, we assign our leaner DataFrame back to fruit-underscore-prices DataFrame.

22. Your turn!

Now it's your turn to prepare tables for merging!