Split into train and test
Now that we have a dataframe, we can apply standard techniques for modeling. In this exercise, you will split the data into training and test sets.
Cet exercice fait partie du cours
Predictive Analytics using Networked Data in R
Instructions
- To ensure the reproducibility of your results, set a seed to 7, using
set.seed()
. - Use the
sample()
function to sample two-thirds of the numbers from the sequence from the range of the total number of rows instudentnetworkdata
. Name this vectorindex_train
. - Create the training set by including the rows of
studentnetworkdata
that are stored inindex_train
and name ittraining_set
. - Create the test set by excluding the rows of
studentnetworkdata
that are stored inindex_train
and name ittest_set
.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Set the seed
set.seed(___)
# Creat the index vector
index_train <- sample(1:nrow(___), 2 / 3 * nrow(___))
# Make the training set
training_set <- ___[index_train,]
# Make the test set
___ <- ___[-index_train,]