Get startedGet started for free

Predict with the soybean model on test data

In this exercise, you will apply the soybean models from the previous exercise (model.lin and model.gam, already loaded) to new data: soybean_test.

This exercise is part of the course

Supervised Learning in R: Regression

View Course

Exercise instructions

  • Create a column soybean_test$pred.lin with predictions from the linear model model.lin.
  • Create a column soybean_test$pred.gam with predictions from the gam model model.gam.
    • For GAM models, the predict() method returns a matrix, so use as.numeric() to convert the matrix to a vector.
  • Fill in the blanks to pivot_longer() the prediction columns into a single value column pred with key column modeltype. Call the long data frame soybean_long.
  • Calculate and compare the RMSE of both models.
    • Which model does better?
  • Run the code to compare the predictions of each model against the actual average leaf weights.
    • A scatter plot of weight as a function of Time.
    • Point-and-line plots of the predictions (pred) as a function of Time.
    • Notice that the linear model sometimes predicts negative weights! Does the gam model?

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# soybean_test is available
summary(soybean_test)

# Get predictions from linear model
soybean_test$pred.lin <- ___(___, newdata = ___)

# Get predictions from gam model
soybean_test$pred.gam <- ___(___(___, newdata = ___))

# Pivot the predictions into a "long" dataset
soybean_long <- soybean_test %>%
  pivot_longer(cols = c(___, ___), names_to = ___, values_to = ___)

# Calculate the rmse
soybean_long %>%
  mutate(residual = weight - pred) %>%     # residuals
  group_by(modeltype) %>%                  # group by modeltype
  summarize(rmse = ___(___(___))) # calculate the RMSE

# Compare the predictions against actual weights on the test data
soybean_long %>%
  ggplot(aes(x = Time)) +                          # the column for the x axis
  geom_point(aes(y = weight)) +                    # the y-column for the scatterplot
  geom_point(aes(y = pred, color = modeltype)) +   # the y-column for the point-and-line plot
  geom_line(aes(y = pred, color = modeltype, linetype = modeltype)) + # the y-column for the point-and-line plot
  scale_color_brewer(palette = "Dark2")
  
Edit and Run Code