Explore churn rate and split data
Building on top of the overview you saw in Chapter 1, in this lesson, you're going to dig deeper into the data preparation needed for using machine learning to perform churn prediction. You will explore the churn distribution and split the data into training and testing before you proceed to modeling. In this step you get to understand how the churn rate is distributed, and pre-process the data so you can build a model on the training set, and measure its performance on unused testing data.
The telecom dataset has been loaded as a pandas
DataFrame named telcom
. The target variable column is called Churn
.
This exercise is part of the course
Machine Learning for Marketing in Python
Exercise instructions
- Print the unique values in the
Churn
column. - Calculate the ratio size of each churn group.
- Import the function for splitting data to train and test.
- Split the data into 75% train and 25% test.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Print the unique Churn values
print(___(telcom['Churn']))
# Calculate the ratio size of each churn group
telcom.___(['Churn']).size() / telcom.shape[0] * 100
# Import the function for splitting data to train and test
from sklearn.model_selection import ___
# Split the data into train and test
train, test = ___(telcom, test_size = .25)