Get startedGet started for free

Generating Predictions and Residuals

Extract model predictions using rxPredict().

This exercise is part of the course

Big Data Analysis with Revolution R Enterprise

View Course

Exercise instructions

Use rxPredict() to generate arrival delay predictions for the model we created in the prior exercise (myLM2).

Before we start with making predictions. Go ahead and summarize myLM2, so that you can refresh your memory on the model and results.

rxPredict() is the RevoScaleR function that allows us to generate predictions and residuals based on a variety of different models.

The syntax is: rxPredict(modelObject, data, outData, computeResiduals, …)

  • modelObject - The model you would like to use in order to generate predictions.
  • data - The data for which you want to make predictions.
  • outData - The location where you would like to store the residuals.
  • computeResiduals - Whether to compute residuals or not.
  • - Additional arguments.

In this case, we need to be careful. We need to keep in mind that it will generate as many predictions as there are observations in the dataset. If you are trying to generate predictions for a billion observations, your output will also have a billion predictions. Because of this, the output is, by default not stored in memory, but rather, stored in an xdf file.

Go ahead and create a new xdf file to store the predicted values of our original dataset. Like other RevoScaleR functions, it can take additional arguments that control which variables are kept in the creation of new data sets. Since we are going to create our own copy of the dataset, we should also specify the writeModelVars argument so that the values of the predictor variables are also included in the new prediction file.

After using rxPredict() and rxGetInfo() to generate predictions, use the same methods to generate the residuals.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

## summarize model first


## path to new dataset storing predictions


## generate predictions


## get information on the new dataset


## Generate residuals.


## get information on the new dataset
Edit and Run Code