Get startedGet started for free

Random Forest: visualization

Now you need to plot the predictions. With the gradient boosted trees model, you drew a scatter plot of predicted responses vs. actual responses, and a density plot of the residuals. You are now going to adapt those plots to display the results from both models at once.

This exercise is part of the course

Introduction to Spark with sparklyr in R

View Course

Exercise instructions

A local tibble both_responses, containing predicted and actual years for both models, has been pre-defined.

  • Update the predicted vs. actual response scatter plot.
    • Use the both_responses dataset.
    • Add a color aesthetic to draw each model in a different color. Use color = model.
    • Rather than drawing the points, use geom_smooth() to draw a smooth curve for each model.
  • Create a tibble of residuals, named residuals.
    • Call mutate() on both_responses.
    • The new column should be called residual.
    • residual should be equal to the predicted response minus the actual response.
  • Update the residual density plot.
    • Add a color aesthetic to draw each model in a different color.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# both_responses has been pre-defined
both_responses

# Draw a scatterplot of predicted vs. actual
ggplot(___, aes(actual, predicted, ___)) +
  # Add a smoothed line
  ___ +
  # Add a line at actual = predicted
  geom_abline(intercept = 0, slope = 1)

# Create a tibble of residuals
residuals <- ___

# Draw a density plot of residuals
ggplot(residuals, aes(residual, ___)) +
    # Add a density curve
    geom_density() +
    # Add a vertical line through zero
    geom_vline(xintercept = 0)
Edit and Run Code