Data resampling
The first step in a machine learning project is to create training and test datasets for model fitting and evaluation. The test dataset provides an estimate of how your model will perform on new data and helps to guard against overfitting.
You will be working with the telecom_df dataset which contains information on customers of a telecommunications company. The outcome variable is canceled_service and it records whether a customer canceled their contract with the company. The predictor variables contain information about customers' cell phone and Internet usage as well as their contract type and monthly charges.
The telecom_df tibble has been loaded into your session.
Diese Übung ist Teil des Kurses
Modeling with tidymodels in R
Anleitung zur Übung
- Create an
rsampleobject,telecom_split, that contains the instructions for randomly splitting thetelecom_dfdata into training and test datasets.- Allocate 75% of the data into training and stratify the results by
canceled_service.
- Allocate 75% of the data into training and stratify the results by
- Pass the
telecom_splitobject to the appropriatersamplefunctions to create the training and test datasets. - Check the number of rows in each datasets by passing them to the
nrow()function.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Create data split object
telecom_split <- ___(___, prop = ___,
strata = ___)
# Create the training data
telecom_training <- ___ %>%
___
# Create the test data
telecom_test <- ___ %>%
___
# Check the number of rows
nrow(___)
nrow(___)