Randomly order the data frame

One way you can take a train/test split of a dataset is to order the dataset randomly, then divide it into the two sets. This ensures that the training set and test set are both random samples and that any biases in the ordering of the dataset (e.g. if it had originally been ordered by price or size) are not retained in the samples we take for training and testing your models. You can think of this like shuffling a brand new deck of playing cards before dealing hands.

First, you set a random seed so that your work is reproducible and you get the same random split each time you run your script:

set.seed(42)

Next, you use the sample() function to shuffle the row indices of the diamonds dataset. You can later use these indices to reorder the dataset.

rows <- sample(nrow(diamonds))

Finally, you can use this random vector to reorder the diamonds dataset:

diamonds <- diamonds[rows, ]

Questo esercizio fa parte del corso

Machine Learning with caret in R

Visualizza il corso

Istruzioni dell'esercizio

Set the random seed to 42.
Make a vector of row indices called rows.
Randomly reorder the diamonds data frame, assigning to shuffled_diamonds.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Set seed


# Shuffle row indices: rows


# Randomly order data

Modifica ed esegui il codice