LoslegenKostenlos loslegen

Build a random forest model

Here you will use the same cross-validation data to build (using train) and evaluate (using validate) random forests for each partition. Since you are using the same cross-validation partitions as your regression models, you are able to directly compare the performance of the two models.

Note: We will limit our random forests to contain 100 trees to ensure they finish fitting in a reasonable time. The default number of trees for ranger() is 500.

Diese Übung ist Teil des Kurses

Machine Learning in the Tidyverse

Kurs anzeigen

Anleitung zur Übung

  • Use ranger() to build a random forest predicting life_expectancy using all features in train for each cross validation partition.
  • Add a new column validate_predicted predicting the life_expectancy for the observations in validate using the random forest models you just created.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

library(ranger)

# Build a random forest model for each fold
cv_models_rf <- cv_data %>% 
  mutate(model = map(___, ~ranger(formula = ___, data = ___,
                                    num.trees = 100, seed = 42)))

# Generate predictions using the random forest model
cv_prep_rf <- cv_models_rf %>% 
  mutate(validate_predicted = map2(.x = ___, .y = ___, ~predict(.x, .y)$predictions))
Code bearbeiten und ausführen