Dealing with uncommon categories
Some features can have many different categories but a very uneven distribution of their occurrences. Take for example Data Science's favorite languages to code in, some common choices are Python, R, and Julia, but there can be individuals with bespoke choices, like FORTRAN, C etc. In these cases, you may not want to create a feature for each value, but only the more common occurrences.
This exercise is part of the course
Feature Engineering for Machine Learning in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a series out of the Country column
countries = so_survey_df.____
# Get the counts of each category
country_counts = countries.____
# Print the count values for each category
print(country_counts)