Get startedGet started for free

Selecting data in pandas

1. Selecting data in pandas

We will now talk about DataFrames - two dimensional objects that can hold several data types.

2. Manually create DataFrame

One way to manually create a DataFrame is by passing in a dictionary to the DataFrame() function in pandas. The keys of the dictionary will be the column names, and the values will be the values within the column. Note the order of the values matter and will be aligned to create the proper rows and columns. You can also specify an additional argument, index, to specify the row names of the DataFrame.

3. Subsetting Columns

We will now talk about subsetting columns. To select a single column you can either use square brackets or a dot. So to extract the column A, you can either use df bracket A or df dot A. To select multiple columns, you can pass a list of column names inside square brackets. So to subset the columns A and B, you pass a list of values instead of a single value.

4. Subsetting rows

To subset rows, you first need to decide whether you want to use the row indices (i.e., row numbers) or the row labels (i.e., row names). A quick reminder: when working with indices, remember that Python starts counting from 0.

5. Subsetting rows .iloc

To extract the rows using row indices, you need to use the dot iloc accessor. Since Python is 0-indexed, you need to use df dot iloc[0] to subset the first row of df. To select more than one row, you need to pass a list to dot iloc. So to select the first two rows, you need to pass the list [0,1] to df dot iloc. You can also pass an optional colon after the comma to specify that you want to select all the columns. Although if you don't specify the colon, all columns are selected by default.

6. Subsetting rows .loc

You can also extract rows using the row labels, and to do this, use the dot loc accessor. To select the first row labeled x, you need to use df dot loc['x']. To select more than one row, you need to pass a list to dot loc. So to select the rows x and y, you can pass the list ['x', 'y'] to df dot loc.

7. Multiple rows and columns

You can also use the dot loc and dot iloc accessors to subset both rows and columns simultaneously. Recall that you need to use labels when using dot loc and indices when using dot iloc. df dot loc['x', 'A'] returns 1 and if you pass the lists ['x', 'y'] and ['A', 'B'] to df dot loc, you will obtain a DataFrame consisting of rows x and y, and columns A and B.

8. Conditional subsetting

Finally, you can also subset DataFrames based on the result of boolean expressions. df dot A == 3 returns the values False, False and True. You can use these values to subset the DataFrame by placing df dot A == 3 inside square brackets. You can also pass more than one conditional expression. Use the ampersand (&) and pipe (|) operators for 'and' and 'or' operations, respectively. Make sure you surround each conditional statement in parentheses.

9. Attributes

Before you start working with DataFrames, let's quickly talk about attributes. Similar to methods, you call an attribute on the object, but without the parentheses. For example, shape is an attribute that can be called on a DataFrame object to print its dimensions (or number of rows and columns).

10. Let's practice!

Time to work with Pandas!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.