Introduction to the Polars DataFrame

1. Introduction to the Polars DataFrame

Hi, I'm Liam. I'm an experienced data scientist

2. Meet your instructor

and Polars contributor. I'll guide you through working with Polars.

3. Tabular data

Like Pandas, Polars is designed to work with tabular data, which is structured in rows and columns. We'll use this vacation rentals dataset to explore working with Polars!

4. Differences between Polars and Pandas

While Polars and pandas both provide DataFrame functionality, Polars is typically faster than Pandas due to more use of parallel computation. It also has a lazy mode that optimizes our query. We'll learn more about lazy mode later in the course.

5. Underneath the hood

Polars is built on top of Apache Arrow and Rust. Apache Arrow efficiently stores tabular data in memory, and the Rust language allows for faster processing. These two technologies give Polars its speedy edge against libraries like Pandas.

6. Course outline

In this course, we'll learn how to work with data using Polars. We'll start with seeing how to load a CSV file into a DataFrame and explore its contents.

7. Course outline

We'll then learn how to transform data and build optimized queries with lazy mode.

8. Course outline

Finally, we'll analyze data with filtering and aggregation.

9. Reading a CSV

Now let's create a DataFrame. We start by importing polars as pl. We use the pl.read_csv() function to load our vacation properties dataset into a DataFrame we call rentals.

10. First rows of a DataFrame

The first step in our analysis is to inspect our rentals data. We view the first three rows of our rentals DataFrame with the .head() method. This method shows the first five rows by default, but we can also specify the number of rows as we do here.

11. First rows of a DataFrame

At the top the shape is printed with showing three rows because we used .head(3), and eight columns.

12. First rows of a DataFrame

Then we have the columns of our rentals DataFrame. Each column header shows the column name and dtype. The dtype is the type of data in the column such as integers, floats, or strings. The first column is called name and has a string dtype.

13. First rows of a DataFrame

The other columns have further details about each property such as its average review score with 64-bit float values.

14. Last rows of a DataFrame

We can instead use the .tail() method to get the last rows of the DataFrame.

15. DataFrame metadata

We can quickly access metadata about our rentals dataset with the .shape attribute. This shows us it has 49 properties and 8 columns. The .columns attribute lists what we have captured about our properties.

16. DataFrame schema

The set of all column names and dtypes is called the schema. This is given by the .schema attribute.

17. Inspecting a DataFrame

A convenient way to view the shape, schema and first rows of our rentals DataFrame is to use the .glimpse method. The data is displayed in a vertical format which is especially useful if we have many columns.

18. Let's practice!

Now it's time to write some Polars code!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.