Train - test split
In this chapter, you will keep working with the ANSUR dataset. Before you can build a model on your dataset, you should first decide on which feature you want to predict. In this case, you're trying to predict gender.
You need to extract the column holding this feature from the dataset and then split the data into a training and test set. The training set will be used to train the model and the test set will be used to check its performance on unseen data.
ansur_df
has been pre-loaded for you.
This exercise is part of the course
Dimensionality Reduction in Python
Exercise instructions
- Import the
train_test_split
function fromsklearn.model_selection
. - Assign the
'Gender'
column to y. - Remove the
'Gender'
column from the DataFrame and assign the result toX
. - Set the test size to 30% to perform a 70% train and 30% test data split.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import train_test_split()
from ____.____ import ____
# Select the Gender column as the feature to be predicted (y)
y = ansur_df[____]
# Remove the Gender column to create the training data
X = ansur_df.____(____, ____)
# Perform a 70% train and 30% test data split
X_train, X_test, y_train, y_test = ____(X, y, ____=____)
print(f"{X_test.shape[0]} rows in test set vs. {X_train.shape[0]} in training set, {X_test.shape[1]} Features.")