Predict with the soybean model on test data
In this exercise, you will apply the soybean models from the previous exercise (model.lin
and model.gam
, already loaded) to new data: soybean_test
.
This exercise is part of the course
Supervised Learning in R: Regression
Exercise instructions
- Create a column
soybean_test$pred.lin
with predictions from the linear modelmodel.lin
. - Create a column
soybean_test$pred.gam
with predictions from the gam modelmodel.gam
.- For GAM models, the
predict()
method returns a matrix, so useas.numeric()
to convert the matrix to a vector.
- For GAM models, the
- Fill in the blanks to
pivot_longer()
the prediction columns into a single value columnpred
with key columnmodeltype
. Call the long data framesoybean_long
. - Calculate and compare the RMSE of both models.
- Which model does better?
- Run the code to compare the predictions of each model against the actual average leaf weights.
- A scatter plot of
weight
as a function ofTime
. - Point-and-line plots of the predictions (
pred
) as a function ofTime
. - Notice that the linear model sometimes predicts negative weights! Does the gam model?
- A scatter plot of
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# soybean_test is available
summary(soybean_test)
# Get predictions from linear model
soybean_test$pred.lin <- ___(___, newdata = ___)
# Get predictions from gam model
soybean_test$pred.gam <- ___(___(___, newdata = ___))
# Pivot the predictions into a "long" dataset
soybean_long <- soybean_test %>%
pivot_longer(cols = c(___, ___), names_to = ___, values_to = ___)
# Calculate the rmse
soybean_long %>%
mutate(residual = weight - pred) %>% # residuals
group_by(modeltype) %>% # group by modeltype
summarize(rmse = ___(___(___))) # calculate the RMSE
# Compare the predictions against actual weights on the test data
soybean_long %>%
ggplot(aes(x = Time)) + # the column for the x axis
geom_point(aes(y = weight)) + # the y-column for the scatterplot
geom_point(aes(y = pred, color = modeltype)) + # the y-column for the point-and-line plot
geom_line(aes(y = pred, color = modeltype, linetype = modeltype)) + # the y-column for the point-and-line plot
scale_color_brewer(palette = "Dark2")