Random Forest: visualization
Now you need to plot the predictions. With the gradient boosted trees model, you drew a scatter plot of predicted responses vs. actual responses, and a density plot of the residuals. You are now going to adapt those plots to display the results from both models at once.
This exercise is part of the course
Introduction to Spark with sparklyr in R
Exercise instructions
A local tibble both_responses
, containing predicted and actual years for both models, has been pre-defined.
- Update the predicted vs. actual response scatter plot.
- Use the
both_responses
dataset. - Add a color aesthetic to draw each model in a different color. Use
color = model
. - Rather than drawing the points, use
geom_smooth()
to draw a smooth curve for each model.
- Use the
- Create a tibble of residuals, named
residuals
.- Call
mutate()
onboth_responses
. - The new column should be called
residual
. residual
should be equal to the predicted response minus the actual response.
- Call
- Update the residual density plot.
- Add a color aesthetic to draw each model in a different color.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# both_responses has been pre-defined
both_responses
# Draw a scatterplot of predicted vs. actual
ggplot(___, aes(actual, predicted, ___)) +
# Add a smoothed line
___ +
# Add a line at actual = predicted
geom_abline(intercept = 0, slope = 1)
# Create a tibble of residuals
residuals <- ___
# Draw a density plot of residuals
ggplot(residuals, aes(residual, ___)) +
# Add a density curve
geom_density() +
# Add a vertical line through zero
geom_vline(xintercept = 0)