Session Ready
Exercise

Input transforms: the "hockey stick" (2)

In the last exercise you saw that a quadratic model seems to fit the houseprice data better than a linear model. In this exercise you will confirm whether the quadratic model would perform better on out-of-sample data. Since this data set is small, you will use cross-validation. The quadratic formula fmla_sqr that you created in the last exercise is in your workspace.

For comparison, the sample code will calculate cross-validation predictions from a linear model price ~ size.

Instructions
100 XP

The data frame houseprice and the formula fmla_sqr from the last exercise are in the workspace.

  • Use kWayCrossValidation() to create a splitting plan for a 3-fold cross validation.
    • You can set the 3rd and 4th arguments of the function to NULL.
  • Examine and run the sample code to get the 3-fold cross-validation predictions of the model price ~ size and add them to the column pred_lin.
  • Get the cross-validation predictions for price as a function of squared size. Assign them to the column pred_sqr.
    • The sample code gives you the procedure.
    • You can use the splitting plan you already created.
  • Fill in the blanks to gather the predictions and calculate the residuals.
  • Fill in the blanks to compare the RMSE for the two models. Which one fits better?