Split data to training and testing
You are now ready to build an end-to-end machine learning model by following a few simple steps! You will explore modeling nuances in much more detail in the next chapters, but for now you will practice and understand the key steps.
The independent features have been loaded for you as a pandas
DataFrame named X
, and the dependent values as a pandas
Series named Y
.
Also, the train_test_split
function has been loaded from the sklearn
library. You will now create training and testing datasets, and then make sure the data was correctly split.
This exercise is part of the course
Machine Learning for Marketing in Python
Exercise instructions
- Split
X
andY
into train and test sets with 25% of the data split into testing. - Ensure that the training dataset has only 75% of original data.
- Ensure that the testing dataset has only 25% of original data.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Split X and Y into training and testing datasets
train_X, test_X, train_Y, test_Y = ___(___, ___, test_size=0.___)
# Ensure training dataset has only 75% of original X data
print(___.shape[0] / X.shape[0])
# Ensure testing dataset has only 25% of original X data
print(___.shape[0] / ___.shape[0])