Get startedGet started for free

Generating a random test/train split

For the next several exercises you will use the mpg data from the package ggplot2. The data describes the characteristics of several makes and models of cars from different years. The goal is to predict city fuel efficiency from highway fuel efficiency.

In this exercise, you will split mpg into a training set mpg_train (75% of the data) and a test set mpg_test (25% of the data). One way to do this is to generate a column of uniform random numbers between 0 and 1, using the function runif() (docs).

If you have a dataset dframe of size \(N\), and you want a random subset of approximately size \(100 * X\)% of \(N\) (where \(X\) is between 0 and 1), then:

  1. Generate a vector of uniform random numbers: gp = runif(N).
  2. dframe[gp < X,] will be about the right size.
  3. dframe[gp >= X,] will be the complement.

This exercise is part of the course

Supervised Learning in R: Regression

View Course

Exercise instructions

  • Use the function nrow (docs) to get the number of rows in the data frame mpg. Assign this count to the variable N and print it.
  • Calculate about how many rows 75% of N should be. Assign it to the variable target and print it.
  • Use runif() to generate a vector of N uniform random numbers, called gp.
  • Use gp to split mpg into mpg_train and mpg_test (with mpg_train containing approximately 75% of the data).
  • Use nrow() to check the size of mpg_train and mpg_test. Are they about the right size?

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# mpg is available
summary(mpg)
dim(mpg)

# Use nrow to get the number of rows in mpg (N) and print it
(N <- ___)

# Calculate how many rows 75% of N should be and print it
# Hint: use round() to get an integer
(target <- ___)

# Create the vector of N uniform random variables: gp
gp <- ___

# Use gp to create the training set: mpg_train (75% of data) and mpg_test (25% of data)
mpg_train <- ___
mpg_test <- ___

# Use nrow() to examine mpg_train and mpg_test
___
___
Edit and Run Code