LoslegenKostenlos loslegen

Validation set approach

In the chapter on linear regression, you fit a linear regression model that explains cats' heart weights by their body weights. The job interviewer asks you to evaluate how good your model is.

To answer this question, you need to derive predictions that can be compared against the actual values. In the validation set approach, you divide your data into two parts.

To do that, you can first take a sample of, say, 80% row numbers. Use the chosen row numbers to subset the train set. The rest of the data frame can be used for testing.

Remember that:

rows <- c(1, 3)
df[-rows, ]

subsets all but the first and the third row.

The cats dataset is available in your environment.

Diese Übung ist Teil des Kurses

Practicing Statistics Interview Questions in R

Kurs anzeigen

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

set.seed(123)

# Generate train row numbers
train_rows <- ___(nrow(___), round(0.8 * ___(cats)))
Code bearbeiten und ausführen