LoslegenKostenlos loslegen

Random Forest: modeling

Like gradient boosted trees, random forests are another form of ensemble model. That is, they use lots of simpler models (decision trees, again) and combine them to make a single better model. Rather than running the same model iteratively, random forests run lots of separate models in parallel, each on a randomly chosen subset of the data, with a randomly chosen subset of features. Then the final decision tree makes predictions by aggregating the results from the individual models.

sparklyr's random forest function is called ml_random_forest(). Its usage is exactly the same as ml_gradient_boosted_trees() (see the first exercise of this chapter for a reminder on syntax).

Diese Übung ist Teil des Kurses

Introduction to Spark with sparklyr in R

Kurs anzeigen

Anleitung zur Übung

A Spark connection has been created for you as spark_conn. A tibble attached to the combined and filtered track metadata/timbre data stored in Spark has been pre-defined as track_data_to_model_tbl.

  • Repeat your year prediction analysis, using a random forest model this time.
    • Get the timbre columns from track_data_to_model_tbl and assign the result to feature_colnames.
    • Create the formula for the model using reformulate().
    • Run the random forest model and assign the result to random_forest_model.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# track_data_to_model_tbl has been pre-defined
track_data_to_model_tbl

# Get the timbre columns
feature_colnames <- ___

# Create the formula for the model
year_formula <- ___

# Run the random forest model
random_forest_model <- ___
Code bearbeiten und ausführen