Get startedGet started for free

Adding columns

1. Adding columns

Let's continue our journey by exploring how to add new columns to our DataFrame.

2. Adding a new column

We want to add a new column that calculates how many people can sleep in each property using the doubles and singles columns.

3. Adding a new column

To add this new column we use the .with_columns method.

4. Adding a new column

We first use an expression where we multiply the number of double beds by two

5. Adding a new column

and then add the number of single beds.

6. Adding a new column

Finally, we add .alias to the chain to name the newly created column "total". We see that the second property can sleep 2 more people than the first property because it has an extra double bed.

7. .with_columns() or .select()?

The difference between .with_columns and .select is that .with_columns adds or updates columns while keeping the other columns in the DataFrame,

8. .with_columns() or .select()?

whereas .select returns a subset of columns.

9. Adding an aggregated column

To compare properties we want to add a column with the average price. We create this column by using an aggregation expression in .with_columns. Here, that's pl.col("price").mean() to get the average price, followed by .alias to name the new column. And we see our new column added to the DataFrame. But what would happen if we didn't use .alias? Well in this case we would overwrite the existing price column rather than adding a new column.

10. Changing the dtype of a column

The dtypes of a column can also be changed. For example, the bedrooms column is a 64-bit integer. With 64-bit integers, we allow for properties with more than nine quintillion rooms - which is perhaps more than we need! If we convert this column to 16-bit integers we can still handle properties with up to 30 thousand rooms. This conversion from 64 to 16-bit integers reduces memory usage and makes our code run faster.

11. Changing the dtype of a column

We can change the dtype of a column with the .cast expression, where we pass the Polars dtype that we want to convert to. Here we cast the bedrooms column to 16-bit integer. In this case, we don't finish the expression with .alias because we want to overwrite the existing column rather than creating a new column.

12. Renaming columns

Now we want to rename columns to make them more informative. In this example, we want to rename doubles to double_beds and singles to single_beds.

13. Renaming columns

We can rename columns using the .rename method on a DataFrame. We pass a dictionary to .rename that maps the current column names to new ones. This gives a DataFrame with updated column names.

14. Removing a column

We can remove columns by calling the .drop method and passing the columns to be dropped as the arguments. This returns the DataFrame without the dropped review and beach columns.

15. Let's practice!

Now we've learned how to add, update and remove columns in a DataFrame, it's time for you to practice.