Validation set approach
In the chapter on linear regression, you fit a linear regression model that explains cats' heart weights by their body weights. The job interviewer asks you to evaluate how good your model is.
To answer this question, you need to derive predictions that can be compared against the actual values. In the validation set approach, you divide your data into two parts.
To do that, you can first take a sample of, say, 80% row numbers. Use the chosen row numbers to subset the train set. The rest of the data frame can be used for testing.
Remember that:
rows <- c(1, 3)
df[-rows, ]
subsets all but the first and the third row.
The cats
dataset is available in your environment.
This exercise is part of the course
Practicing Statistics Interview Questions in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
set.seed(123)
# Generate train row numbers
train_rows <- ___(nrow(___), round(0.8 * ___(cats)))