Class imbalance
In the volunteer
dataset, you're thinking about trying to predict the category_desc
variable using the other features in the dataset. First, though, you need to know what the class distribution (and imbalance) is for that label.
Which descriptions occur less than 50 times in the volunteer
dataset?
This exercise is part of the course
Preprocessing for Machine Learning in Python
Hands-on interactive exercise
Turn theory into action with one of our interactive exercises
