Aan de slagGa gratis aan de slag

Dealing with uncommon categories

Some features can have many different categories but a very uneven distribution of their occurrences. Take for example Data Science's favorite languages to code in, some common choices are Python, R, and Julia, but there can be individuals with bespoke choices, like FORTRAN, C etc. In these cases, you may not want to create a feature for each value, but only the more common occurrences.

Deze oefening maakt deel uit van de cursus

Feature Engineering for Machine Learning in Python

Cursus bekijken

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Create a series out of the Country column
countries = so_survey_df.____

# Get the counts of each category
country_counts = countries.____

# Print the count values for each category
print(country_counts)
Code bewerken en uitvoeren