CommencerCommencer gratuitement

Dealing with uncommon categories

Some features can have many different categories but a very uneven distribution of their occurrences. Take for example Data Science's favorite languages to code in, some common choices are Python, R, and Julia, but there can be individuals with bespoke choices, like FORTRAN, C etc. In these cases, you may not want to create a feature for each value, but only the more common occurrences.

Cet exercice fait partie du cours

Feature Engineering for Machine Learning in Python

Afficher le cours

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Create a series out of the Country column
countries = so_survey_df.____

# Get the counts of each category
country_counts = countries.____

# Print the count values for each category
print(country_counts)
Modifier et exécuter le code