ComenzarEmpieza gratis

Avoiding class imbalances

Some data contains very imbalanced outcomes - like a rare disease dataset. When splitting randomly, you might end up with a very unfortunate split. Imagine all the rare observations are in the test and none in the training set. That would ruin your whole training process!

Fortunately, the initial_split() function provides a remedy. You are going to observe and solve these so-called class imbalances in this exercise.

There is already code provided to create a split object diabetes_split with a 75% training and 25% test split.

Este ejercicio forma parte del curso

Machine Learning with Tree-Based Models in R

Ver curso

Ejercicio interactivo práctico

Prueba este ejercicio completando el código de muestra.

# Preparation
set.seed(9888)
diabetes_split <- initial_split(diabetes, prop = 0.75)

# Proportion of 'yes' outcomes in the training data
counts_train <- table(training(___)$outcome)
prop_yes_train <- counts_train["___"] / sum(counts_train)

# Proportion of 'yes' outcomes in the test data
counts_test <- table(___)
prop_yes_test <- ___ / sum(___)

paste("Proportion of positive outcomes in training set:", round(prop_yes_train, 2))
paste("Proportion of positive outcomes in test set:", round(prop_yes_test, 2))
Editar y ejecutar código