Session Ready
Exercise

Dealing with uncommon categories

Some features can have many different categories but a very uneven distribution of their occurrences. Take for example Data Science's favorite languages to code in, some common choices are Python, R, and Julia, but there can be individuals with bespoke choices, like FORTRAN, C etc. In these cases, you may not want to create a feature for each value, but only the more common occurrences.

Instructions 1/3
undefined XP
  • 1
  • 2
  • 3
  • Extract the Country column of so_survey_df as a series and assign it to countries.
  • Find the counts of each category in the newly created countries series.