ComeçarComece de graça

Train - test split

In this chapter, you will keep working with the ANSUR dataset. Before you can build a model on your dataset, you should first decide on which feature you want to predict. In this case, you're trying to predict gender.

You need to extract the column holding this feature from the dataset and then split the data into a training and test set. The training set will be used to train the model and the test set will be used to check its performance on unseen data.

ansur_df has been pre-loaded for you.

Este exercício faz parte do curso

Dimensionality Reduction in Python

Ver curso

Instruções do exercício

  • Import the train_test_split function from sklearn.model_selection.
  • Assign the 'Gender' column to y.
  • Remove the 'Gender' column from the DataFrame and assign the result to X.
  • Set the test size to 30% to perform a 70% train and 30% test data split.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Import train_test_split()
from ____.____ import ____

# Select the Gender column as the feature to be predicted (y)
y = ansur_df[____]

# Remove the Gender column to create the training data
X = ansur_df.____(____, ____)

# Perform a 70% train and 30% test data split
X_train, X_test, y_train, y_test = ____(X, y, ____=____)

print(f"{X_test.shape[0]} rows in test set vs. {X_train.shape[0]} in training set, {X_test.shape[1]} Features.")
Editar e executar o código