Get startedGet started for free

Concatenating DataFrames

1. Concatenating DataFrames

Welcome to Chapter 3! Real-world data rarely lives in a single file. In this chapter, we'll learn how to combine DataFrames.

2. Back to the restaurant app

Let's return to our restaurant recommendation app. We'll use these restaurant review scores, which we can then combine with the hygiene data and user preferences.

3. Concatenation

Our restaurant review scores are stored in multiple CSV files

4. Concatenation

And we need to combine them into a single DataFrame. Combining DataFrames like this is called concatenation.

5. Our central London data

Our first file has restaurants from central London. Each row has the business name, location, average review score, and average meal price in pounds.

6. Our South London data

The second file covers South London restaurants with the same four columns. We want to put these two DataFrames with the same columns one on top of the other. This is called vertical concatenation.

7. Vertical concatenation

To concatenate the DataFrames, we use the pl.concat function.

8. Vertical concatenation

We pass a list of the DataFrames we want to combine.

9. Vertical concatenation

We then set how to vertical, which is also the default. Now we have all the restaurants from both files in one DataFrame. Here we passed two DataFrames in the list, but we can pass as many as we need.

10. Vertical concatenation

If all of the CSVs have similar names and the same columns we can do vertical concatenation directly from the CSVs. Here we call pl.read_csv and use a wildcard * to say we want all files that begin with restaurants and end with .csv

11. A third batch with missing data

We received a third batch of new listings from North London. However, these new partners haven't provided prices yet, so the price column is missing.

12. Vertical concat fails

If we try vertical concat with all three, Polars raises a ShapeError because the new listings have 3 columns while the others have 4.

13. Diagonal concatenation

Instead, we use diagonal concatenation by setting the how argument to diagonal.

14. Diagonal concatenation

Diagonal concat combines all columns from every DataFrame. Where a column doesn't exist in a file, the values are filled with null. The last two rows have null for price because the new listings didn't include it. Next, let's look at horizontal concatenation.

15. Horizontal concatenation

A colleague provides us with cuisine categories for each restaurant in the same order as our DataFrame. We want to add this column to our main DataFrame. This is called horizontal concatenation.

16. Horizontal concatenation

We set how to horizontal, and the cuisine column is added to our DataFrame.

17. Appending with extend

Finally, let's look at a quick way to append rows. Sometimes we need to add a small DataFrame with the same columns to an existing one. Here we have one new restaurant to add.

18. Appending with extend

The extend method adds rows to the existing DataFrame in place. It's especially efficient when adding small batches to large DataFrames.

19. Let's practice!

Now it's your turn to combine DataFrames with concat and extend!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.