Random Forest: prediction

Now you need to make some predictions with your random forest model. The syntax is the same as with the gradient boosted trees model.

A Spark connection has been created for you as spark_conn. Tibbles attached to the training and testing datasets stored in Spark have been pre-defined as track_data_to_model_tbl and track_data_to_predict_tbl respectively. The random forest model has been pre-defined as random_forest_model.

Define a variable predicted that contains the model's predictions for our testing data.
- Call ml_predict() with the model and the testing data as arguments. This function will generate predictions for the testing dataset and add these as a new column named prediction.
Define the responses variable to prepare the data for comparing predicted responses with actual responses:
- Select the response column year.
- Collect the results.
- Use mutate() to add in the predictions made in predicted.

Light My Fire: Starting To Use Spark With dplyr Syntax

Tools of the Trade: Advanced dplyr Usage

Going Native: Use The Native Interface to Manipulate Spark DataFrames

Case Study: Learning to be a Machine: Running Machine Learning Models on Spark

Exercise

Random Forest: prediction

Instructions