Train - test split

In this chapter, you will keep working with the ANSUR dataset. Before you can build a model on your dataset, you should first decide on which feature you want to predict. In this case, you're trying to predict gender.

You need to extract the column holding this feature from the dataset and then split the data into a training and test set. The training set will be used to train the model and the test set will be used to check its performance on unseen data.

ansur_df has been pre-loaded for you.

Questo esercizio fa parte del corso

Dimensionality Reduction in Python

Visualizza il corso

Istruzioni dell'esercizio

Import the train_test_split function from sklearn.model_selection.
Assign the 'Gender' column to y.
Remove the 'Gender' column from the DataFrame and assign the result to X.
Set the test size to 30% to perform a 70% train and 30% test data split.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Import train_test_split()
from ____.____ import ____

# Select the Gender column as the feature to be predicted (y)
y = ansur_df[____]

# Remove the Gender column to create the training data
X = ansur_df.____(____, ____)

# Perform a 70% train and 30% test data split
X_train, X_test, y_train, y_test = ____(X, y, ____=____)

print(f"{X_test.shape[0]} rows in test set vs. {X_train.shape[0]} in training set, {X_test.shape[1]} Features.")

Modifica ed esegui il codice