Validation set approach
In the chapter on linear regression, you fit a linear regression model that explains cats' heart weights by their body weights. The job interviewer asks you to evaluate how good your model is.
To answer this question, you need to derive predictions that can be compared against the actual values. In the validation set approach, you divide your data into two parts.
To do that, you can first take a sample of, say, 80% row numbers. Use the chosen row numbers to subset the train set. The rest of the data frame can be used for testing.
Remember that:
rows <- c(1, 3)
df[-rows, ]
subsets all but the first and the third row.
The cats
dataset is available in your environment.
Diese Übung ist Teil des Kurses
Practicing Statistics Interview Questions in R
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
set.seed(123)
# Generate train row numbers
train_rows <- ___(nrow(___), round(0.8 * ___(cats)))