1. Adding columns
Let's continue our journey
by exploring how to add new columns to our DataFrame.
2. Adding a new column
We want to add a new column
that calculates how many people can sleep in each property
using the doubles and singles columns.
3. Adding a new column
To add this new column
we use the .with_columns method.
4. Adding a new column
We first use an expression where we multiply the number of double beds by two
5. Adding a new column
and then add the number of single beds.
6. Adding a new column
Finally, we add .alias to the chain to name the newly created column "total".
We see that the second property can sleep 2 more people than the first property because it has an extra double bed.
7. .with_columns() or .select()?
The difference between .with_columns and .select is that
.with_columns adds or updates columns while keeping the other columns in the DataFrame,
8. .with_columns() or .select()?
whereas .select returns a subset of columns.
9. Adding an aggregated column
To compare properties we want to add a column with the average price.
We create this column by using an aggregation expression in .with_columns. Here, that's pl.col("price").mean() to get the average price, followed by .alias to name the new column.
And we see our new column added to the DataFrame. But what would happen if we didn't use .alias? Well in this case we would overwrite the existing price column rather than adding a new column.
10. Changing the dtype of a column
The dtypes of a column can also be changed.
For example,
the bedrooms column is a 64-bit integer. With 64-bit integers, we allow for properties with more than nine quintillion rooms - which is perhaps more than we need! If we convert this column to 16-bit integers we can still handle properties with up to 30 thousand rooms. This conversion from 64 to 16-bit integers reduces memory usage and makes our code run faster.
11. Changing the dtype of a column
We can change the dtype of a column
with the .cast expression, where we pass the Polars dtype that we want to convert to. Here we cast the bedrooms column to 16-bit integer. In this case, we don't finish the expression with .alias
because we want to overwrite the existing column rather than creating a new column.
12. Renaming columns
Now we want to rename columns to make them more informative.
In this example,
we want to rename doubles to double_beds and singles to single_beds.
13. Renaming columns
We can rename columns
using the .rename method on a DataFrame. We pass a dictionary to .rename that maps the current column names to new ones.
This gives a DataFrame with updated column names.
14. Removing a column
We can remove columns
by calling the .drop method and passing the columns to be dropped as the arguments.
This returns the DataFrame without the dropped review and beach columns.
15. Let's practice!
Now we've learned how to add, update and remove columns in a DataFrame, it's time for you to practice.