Get startedGet started for free

Validation set approach

In the chapter on linear regression, you fit a linear regression model that explains cats' heart weights by their body weights. The job interviewer asks you to evaluate how good your model is.

To answer this question, you need to derive predictions that can be compared against the actual values. In the validation set approach, you divide your data into two parts.

To do that, you can first take a sample of, say, 80% row numbers. Use the chosen row numbers to subset the train set. The rest of the data frame can be used for testing.

Remember that:

rows <- c(1, 3)
df[-rows, ]

subsets all but the first and the third row.

The cats dataset is available in your environment.

This exercise is part of the course

Practicing Statistics Interview Questions in R

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

set.seed(123)

# Generate train row numbers
train_rows <- ___(nrow(___), round(0.8 * ___(cats)))
Edit and Run Code