Get startedGet started for free

Explore churn rate and split data

Building on top of the overview you saw in Chapter 1, in this lesson, you're going to dig deeper into the data preparation needed for using machine learning to perform churn prediction. You will explore the churn distribution and split the data into training and testing before you proceed to modeling. In this step you get to understand how the churn rate is distributed, and pre-process the data so you can build a model on the training set, and measure its performance on unused testing data.

The telecom dataset has been loaded as a pandas DataFrame named telcom. The target variable column is called Churn.

This exercise is part of the course

Machine Learning for Marketing in Python

View Course

Exercise instructions

  • Print the unique values in the Churn column.
  • Calculate the ratio size of each churn group.
  • Import the function for splitting data to train and test.
  • Split the data into 75% train and 25% test.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Print the unique Churn values
print(___(telcom['Churn']))

# Calculate the ratio size of each churn group
telcom.___(['Churn']).size() / telcom.shape[0] * 100

# Import the function for splitting data to train and test
from sklearn.model_selection import ___

# Split the data into train and test
train, test = ___(telcom, test_size = .25)
Edit and Run Code