1. Learn
  2. /
  3. Courses
  4. /
  5. Machine Learning with Tree-Based Models in R

Connected

Exercise

Avoiding class imbalances

Some data contains very imbalanced outcomes - like a rare disease dataset. When splitting randomly, you might end up with a very unfortunate split. Imagine all the rare observations are in the test and none in the training set. That would ruin your whole training process!

Fortunately, the initial_split() function provides a remedy. You are going to observe and solve these so-called class imbalances in this exercise.

There is already code provided to create a split object diabetes_split with a 75% training and 25% test split.

Instructions 1/2

undefined XP
  • 1
    • Count the proportion of "yes" outcomes in the training and test sets of diabetes_split.
  • 2
    • Redesign diabetes_split using the same training/testing proportion, but the outcome variable being equally distributed in both sets.
    • Count the proportion of yes outcomes in both datasets.