Get startedGet started for free

Modifying imports: true/false data

1. Modifying imports: true/false data

In this course, you've mostly handled string and numeric data. This lesson focuses on another data type, Booleans, and special considerations for working with them.

2. Boolean Data

A Boolean variable has only two possible values: True or False, which makes them convenient for tasks like filtering. Despite this simplicity, Booleans can be tricky. We'll use a subset of the New Developer Survey data to focus on them. This version only has an ID column,

3. Boolean Data

and columns for whether the respondent attended a programming bootcamp

4. Boolean Data

and if they took out a loan for it.

5. Boolean Data

True and false are represented in a few ways for demonstration purposes:

6. Boolean Data

zeros and ones, which are common among people with coding experience,

7. Boolean Data

Trues and Falses,

8. Boolean Data

and yeses and nos, which tend to appear in surveys and forms.

9. pandas and Booleans

Let's load this data with no additional arguments and check dtypes. pandas interpreted no columns as Boolean! Even True/False columns were loaded as floats. Let's investigate.

10. pandas and Booleans

First, let's sum the dataframe's columns to see how many Trues each float column has. Recall that these columns code True as 1 and False as 0. In our data subset, 38 attended a bootcamp and 14 took out a loan for it. Let's also check how many values in each column are missing by summing the results of is NA. Every record has a value for bootcamp attendance, but most of the loan values are blank, even for some students who attended a bootcamp.

11. pandas and Booleans

Now let's cast these columns as Booleans with the dtype argument. read Excel's dtype works exactly like read CSV's, so we pass a dictionary specifying which columns should be Boolean. Checking dtypes, it looks like it worked.

12. pandas and Booleans

Checking counts of True values reveals issues. The loan columns have too many Trues, and the yes/no ones are all True. Checking NA values by column, we see there aren't any.

13. pandas and Booleans

What happened? pandas automatically loads True/False columns as floats, but that can be changed with dtype. Boolean values must be either True or False, so NAs were re-coded as True. While pandas recognized that zeros and ones are False and True, respectively, it did not know what to do with Yes and No, so they were all coded as True.

14. Setting Custom True/False Values

We can solve the issue of the yes/no columns being misinterpreted with read Excel's true values and false values arguments. Each takes a list of values that pandas should treat as True or False when they appear in Boolean columns. Values in non-Boolean columns are not converted.

15. Setting Custom True/False Values

Let's pass "Yes" and "No" as single-item lists to true values and false values.

16. Setting Custom True/False Values

Then, we check True counts with the sum method. Now the yes/no columns match their counterparts. But there is still the issue of NA's being coded as True.

17. Boolean Considerations

What to do depends on the data. In our case, we don't want fake Trues, so we might decide to keep the loan columns as floats. Things to consider when casting a column as Boolean include the presence of NA values, how the column will be used in the analysis, the consequences of fake True values, and whether alternative representations like floats would do.

18. Let's practice!

You just learned how to cast columns as Boolean and set custom True and False values. Importantly, you also learned what to consider before doing so. Now, it's time to practice. Good luck!