Split data to training and testing
Final step before we move to building the regression model! Here, you will follow the steps of identifying the names of the target variable and the feature columns, extract the data, and split them into training and testing.
The pandas
and numpy
libraries have been loaded as pd
as np
respectively. The input features are imported as the features
dataset, and the target variable you built in the previous exercise has been imported for you as Y
.
This exercise is part of the course
Machine Learning for Marketing in Python
Exercise instructions
- Store the customer identifier column name as a list.
- Select the feature column names excluding the customer identifier.
- Extract the features as
X
. - Split the data to training and testing by using the
train_test_split()
function.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Store customer identifier column name as a list
custid = ['___']
# Select feature column names excluding customer identifier
cols = [col for col in features.___ if col not in ___]
# Extract the features as `X`
X = features[___]
# Split data to training and testing
___, test_X, train_Y, ___ = ___(X, Y, test_size=0.25, random_state=99)