Aan de slagGa gratis aan de slag

From zero to hero

You mastered the skills of creating a model specification and splitting the data into training and test sets. You also know how to avoid class imbalances in the split. It's now time to combine what you learned in the preceding lesson and build your model using only the training set!

You are going to build a proper machine learning pipeline. This is comprised of creating a model specification, splitting your data into training and test sets, and last but not least, fitting the training data to a model. Enjoy!

Deze oefening maakt deel uit van de cursus

Machine Learning with Tree-Based Models in R

Cursus bekijken

Oefeninstructies

  • Create diabetes_split, a split where the training set contains three-quarters of all diabetes rows and where training and test sets have a similar distribution in the outcome variable.
  • Build a decision tree specification for your model using the rpart engine and save it as tree_spec.
  • Fit a model model_trained using the training data of diabetes_split with outcome as the target variable and bmi and skin_thickness as the predictors.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

set.seed(9)

# Create the balanced data split
diabetes_split <- ___

# Build the specification of the model
tree_spec <- ___ %>% 
  ___ %>% 
  ___

# Train the model
model_trained <- ___ %>% 
  fit(___, 
      ___)

model_trained
Code bewerken en uitvoeren