Varying training set size
The size of your training and testing sets influences model performance. Models learn better when they have more training data. However, there's a risk that they overfit to the training data and don't generalize well to new data, so in order to properly evaluate the model's ability to generalize, you need enough testing data. As a result, there is a important balance and trade-off involved between how much you use for training and how much you hold for testing.
So far, you've used 70% for training and 30% for testing. Let's now use 80% of the data for training and evaluate how that changes the model's performance.
This exercise is part of the course
Marketing Analytics: Predicting Customer Churn in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import train_test_split
from sklearn.model_selection import train_test_split
# Create feature variable
X = telco.drop('Churn', axis=1)
# Create target variable
y = telco['Churn']
# Create training and testing sets
X_train, X_test, y_train, y_test = ____