1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to Spark with sparklyr in R

Connected

Exercise

Random Forest: prediction

Now you need to make some predictions with your random forest model. The syntax is the same as with the gradient boosted trees model.

Instructions

100 XP

A Spark connection has been created for you as spark_conn. Tibbles attached to the training and testing datasets stored in Spark have been pre-defined as track_data_to_model_tbl and track_data_to_predict_tbl respectively. The random forest model has been pre-defined as random_forest_model.

  • Define a variable predicted that contains the model's predictions for our testing data.
    • Call ml_predict() with the model and the testing data as arguments. This function will generate predictions for the testing dataset and add these as a new column named prediction.
  • Define the responses variable to prepare the data for comparing predicted responses with actual responses:
    • Select the response column year.
    • Collect the results.
    • Use mutate() to add in the predictions made in predicted.