ComenzarEmpieza gratis

Randomly order the data frame

One way you can take a train/test split of a dataset is to order the dataset randomly, then divide it into the two sets. This ensures that the training set and test set are both random samples and that any biases in the ordering of the dataset (e.g. if it had originally been ordered by price or size) are not retained in the samples we take for training and testing your models. You can think of this like shuffling a brand new deck of playing cards before dealing hands.

First, you set a random seed so that your work is reproducible and you get the same random split each time you run your script:

set.seed(42)

Next, you use the sample() function to shuffle the row indices of the diamonds dataset. You can later use these indices to reorder the dataset.

rows <- sample(nrow(diamonds))

Finally, you can use this random vector to reorder the diamonds dataset:

diamonds <- diamonds[rows, ]

Este ejercicio forma parte del curso

Machine Learning with caret in R

Ver curso

Instrucciones del ejercicio

  • Set the random seed to 42.
  • Make a vector of row indices called rows.
  • Randomly reorder the diamonds data frame, assigning to shuffled_diamonds.

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# Set seed


# Shuffle row indices: rows


# Randomly order data
Editar y ejecutar código