Get startedGet started for free

Setting category variables

1. Setting category variables

To get the most out of using the pandas categorical dtype, we need to understand how to set, add, and remove categories.

2. New dataset: adoptable dogs

Before we begin, let's checkout another interesting dataset. The adoptable dogs dataset contains information on 2937 adoptable dogs and contains a lot of great categorical columns for us to explore.

3. A dog's coat

Let's start by converting the coat variable to a category using the astype method, and then check the frequency distribution using the value counts method. We are setting the dropna parameter to false to check for any missing entries. We see that a short coat is the most common, while a long coat is the least common.

4. The .cat accessor object

We are going to use the dot-cat accessor object a lot in this chapter. This object let's us access and manipulate the categories of a categorical Series. Most of the methods we will introduce use the following parameters: new-categories - which is a list of new categories for the Series, inplace - which is a Boolean value for whether or not the method should overwrite the current Series, and ordered - which is a Boolean for whether or not the new Series should be treated as an ordered categorical or not. Our first example of using this object and these parameters will be setting new categories.

5. Setting Series categories

cat-dot-set categories is used to set specific categories for a Series. Any values not listed in the new-categories list will be dropped. Checking the value counts of this Series again, we see that the wirehaired responses have been set to NaN. This happens because the wirehaired category is not listed in the new-categories parameter and is no longer recognized.

6. Setting order

We can set the order of the categories using the ordered parameter. Checking the head of the pandas Series shows us that the Series now knows the categories have a specific order.

7. Missing categories

In the likes-people column, there are 938 rows without a response. Maybe the dog shelter did not check, or maybe they checked and could not tell. Let's add a couple of categories to clean this up.

8. Adding categories

We can add two categories using the cat-dot-add-categories method. Here we have added two categories, to help clarify what a missing value actually means. Notice that categories not listed in the new-categories parameter are not replaced with NaN values this time and are simply left alone. We can check the final categories using cat-dot-categories on our pandas Series. Awesome - both categories were added and can now be used in this Series.

9. New categories

Although we added categories, this doesn't mean any rows of our data were set to these categories. Checking the value counts one more time verifies this. We will learn how to update values in a different lesson.

10. Removing categories

We can also remove categories using the cat-dot-remove categories method. This method takes a list of categories to remove using the removals parameter. In this example, we remove the wirehaired category altogether. This also means that all wirehaired values will be set to NaN values.

11. Methods recap

Let’s recap the methods covered in this lesson. We first learned how to set categories using the set-categories method, which drops values that are not specified. Add-categories can be used to add new categories, and categories not specified are left alone. Finally, remove-categories can be used to set matching values to NaN.

12. Practice updating categories

Let's work through a few examples of setting, adding, and removing categories.