Modeling an interaction (2)

In this exercise, you will compare the performance of the interaction model you fit in the previous exercise to the performance of a main-effects only model. Because this dataset is small, we will use cross-validation to simulate making predictions on out-of-sample data.

You will begin to use the dplyr package to do calculations.

mutate() (docs) adds new columns to a tbl (a type of data frame)
group_by() (docs) specifies how rows are grouped in a tbl
summarize() (docs) computes summary statistics of a column

You will also use tidyr's pivot_longer() (docs) which takes multiple columns and collapses them into key-value pairs. The alcohol data frame and the formulas fmla_add and fmla_interaction have been pre-loaded.

Use kWayCrossValidation() (docs) to create a splitting plan for a 3-fold cross validation.
- The first argument is the number of rows to be split.
- The second argument is the number of folds for the cross-validation.
- You can set the 3rd and 4th arguments of the function to NULL.
Examine and run the sample code to get the 3-fold cross-validation predictions of a model with no interactions and assign them to the column pred_add.
Get the 3-fold cross-validation predictions of the model with interactions. Assign the predictions to the column pred_interaction.
- The sample code shows you the procedure.
- Use the same splitPlan that you already created.
Fill in the blanks to
- pivot_longer the predictions into a single column pred.
- add a column of residuals (actual outcome - predicted outcome).
- get the RMSE of the cross-validation predictions for each model type.
Compare the RMSEs. Based on these results, which model should you use?

What is Regression?

Training and Evaluating Regression Models

Issues to Consider

Dealing with Non-Linear Responses

Tree-Based Methods

Exercise

Modeling an interaction (2)

Instructions