1. Updating categories
Now that we understand how to create, add, and remove categories in Pandas, let's work on updating and collapsing categories.
2. The breed variable
Take a look at the categories found in the breed column of the dog dataset. The most common category is Unknown Mix. We can rename this category to just be Unknown.
3. Renaming categories
An easy way to do this is with the cat-dot-rename-categories method. If we supply a dictionary of key-value pairs, with the key being the current category, and the value being the desired new category, we can rename categories quickly. Note that this method does not require a dictionary, but we are using one here for clarity.
Let's first make a dictionary so we can map the Unknown Mix category to just be Unknown. Next, we update the breed column using cat-dot-rename-categories and passing the dictionary of changes.
4. The updated breed variable
Notice that Unknown Mix has been changed to just be Unknown, but still has 1,524 responses. When using the cat-dot-rename-categories method, you can rename more than one category at a time, just make a bigger dictionary!
5. Renaming categories with a function
Another nice feature of the cat-dot-rename-categories method is that you can also use a lambda function to update categories. We won't cover lamba functions in this course, but we will show a couple of examples. We are using it here just to show an example of it in action. Let's convert both male and female to be in title case. Using the title method, we convert each category in the sex variable to title case. We now have Female and Male as categories and both are in title case.
6. Common replacement issues
This method does come with two key issues. First, the new category must not currently be in the list of categories. If Unknown was already a category, we would not be able to rename Unknown Mix to be Unknown. And second, we can't use this method to collapse categories. If we wanted both Unknown Mix and Mixed Breed to be the Unknown category, we can't use this method.
7. Collapsing categories setup
So how do we collapse categories? Let's look at the dogs hair color Series as an example. Dog hair can be many different colors. It might make sense for us to make a new categorical column that just has a dog's main or primary color, instead of all of the combinations of colors.
8. Collapsing categories example
We start by making a dictionary of all of the categories we want to collapse. Here we take all black plus one additional color categories and collapse them to the primary color of black. We use the dot-replace method to change each key-value pair listed in the update-colors dictionary.
This method, however, does not preserve the categorical data type and does not use the dot-cat accessor object. What it is really doing is replacing every key in our dictionary with every value, but the method is matching strings, not categories. If we check the dtype of our new column, main-color, we see it is now an Object, not a category.
9. Convert back to categorical
Anytime you are updating the underlying string of a category, you will need to convert the Series back to a categorical dtype using the astype method and specifying category. If we check the categories of our new Series, we see that black is one of them, but black and brown, black and tan, and black and white are all gone.
10. Practice time
Let's practice updating categories.