Get startedGet started for free

Subsetting a DataFrame

1. Subsetting a DataFrame

Welcome back! Now we'll explore how to examine our vacation rentals dataset in more detail by extracting subsets of a DataFrame.

2. Selecting rows

We start by selecting rows. We do this using bracket notation. To select a single row, we use the bracket notation with the integer row index number, so brackets 0 returns the first property in the dataset.

3. Selecting rows

We can also use negative indexing counting from the end of a DataFrame, so brackets minus 1 returns the last property in the dataset.

4. Selecting a range of rows

We can select a range of rows with slice notation in square brackets. With slice notation we specify the starting and ending row numbers separated by a colon. When we write 1-colon-3 inside the brackets, we get the properties on row one and row two of the DataFrame.

5. Creating a Series from a DataFrame column

We can also use bracket notation to select a subset of columns. Let's say we want to see the names of our rental properties - we can put just the name column inside the brackets. With a single column we get a Polars Series instead of a DataFrame. While we generally work with DataFrames, there are times when it is useful to extract a Series, such as when creating visualizations.

6. Selecting multiple columns with brackets

We can also select multiple columns with brackets by passing a list of column names inside double square brackets. Imagine we want to quickly compare property names and their prices - we pass a list with the "name" and "price" columns and we get a two-column DataFrame focused just on the information we need.

7. Selecting rows and columns

We can bring row and column selection together with bracket notation. The rule is rows first, then columns. Suppose we want to check the first three properties and only care about their names and prices. Using slice notation with column selection gives us a DataFrame with 3 rows and 2 columns containing precisely the data we need.

8. Subsetting columns with .select()

An alternative way to subset columns is with the .select() method. In this method, we pass the column names - say the name and price columns - separated by commas. The output is a DataFrame. We always get a DataFrame from the select method - even if we only pass a single column. We can also pass a list of column names to the select method to get the same result.

9. Brackets or select?

So if the select method and bracket notation give the same result, why don't we just use brackets? Well, we see later in the course that the select method is more powerful than brackets because the select method allows us to build optimized queries in lazy mode.

10. Let's practice!

Now it's time to practice subsetting a DataFrame!