1. Introduction to the Polars DataFrame
Hi, I'm Liam.
I'm an experienced data scientist
2. Meet your instructor
and Polars contributor.
I'll guide you through working with Polars.
3. Tabular data
Like Pandas, Polars is designed to work with tabular data, which is structured in rows and columns.
We'll use this vacation rentals dataset to explore working with Polars!
4. Differences between Polars and Pandas
While Polars and pandas both provide DataFrame functionality,
Polars is typically faster than Pandas due to more use of parallel computation.
It also has a lazy mode that optimizes our query. We'll learn more about lazy mode later in the course.
5. Underneath the hood
Polars is built on top of Apache Arrow and Rust. Apache Arrow efficiently stores tabular data in memory, and the Rust language allows for faster processing. These two technologies give Polars its speedy edge against libraries like Pandas.
6. Course outline
In this course, we'll learn how to work with data using Polars. We'll start with seeing how to load a CSV file into a DataFrame and explore its contents.
7. Course outline
We'll then learn how to transform data and build optimized queries with lazy mode.
8. Course outline
Finally, we'll analyze data with filtering and aggregation.
9. Reading a CSV
Now let's create a DataFrame.
We start by importing polars as pl.
We use the pl.read_csv() function to load our vacation properties dataset into a DataFrame we call rentals.
10. First rows of a DataFrame
The first step in our analysis is to inspect our rentals data. We view the first three rows of our rentals DataFrame
with the .head() method. This method shows the first five rows by default, but we can also specify the number of rows as we do here.
11. First rows of a DataFrame
At the top the shape is printed with showing three rows because we used .head(3), and eight columns.
12. First rows of a DataFrame
Then we have the columns of our rentals DataFrame. Each column header shows the column name and dtype. The dtype is the type of data in the column such as integers, floats, or strings. The first column is called name and has a string dtype.
13. First rows of a DataFrame
The other columns have further details about each property such as its average review score with 64-bit float values.
14. Last rows of a DataFrame
We can instead use the .tail() method
to get the last rows of the DataFrame.
15. DataFrame metadata
We can quickly access metadata about our rentals dataset with
the .shape attribute. This shows us it has 49 properties and 8 columns.
The .columns attribute lists what we have captured about our properties.
16. DataFrame schema
The set of all column names and dtypes is called the schema.
This is given by the .schema attribute.
17. Inspecting a DataFrame
A convenient way to view the shape, schema and first rows of our rentals DataFrame is to use the .glimpse method.
The data is displayed in a vertical format which is especially useful if we have many columns.
18. Let's practice!
Now it's time to write some Polars code!