ComenzarEmpieza gratis

Train - test split

In this chapter, you will keep working with the ANSUR dataset. Before you can build a model on your dataset, you should first decide on which feature you want to predict. In this case, you're trying to predict gender.

You need to extract the column holding this feature from the dataset and then split the data into a training and test set. The training set will be used to train the model and the test set will be used to check its performance on unseen data.

ansur_df has been pre-loaded for you.

Este ejercicio forma parte del curso

Dimensionality Reduction in Python

Ver curso

Instrucciones del ejercicio

  • Import the train_test_split function from sklearn.model_selection.
  • Assign the 'Gender' column to y.
  • Remove the 'Gender' column from the DataFrame and assign the result to X.
  • Set the test size to 30% to perform a 70% train and 30% test data split.

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# Import train_test_split()
from ____.____ import ____

# Select the Gender column as the feature to be predicted (y)
y = ansur_df[____]

# Remove the Gender column to create the training data
X = ansur_df.____(____, ____)

# Perform a 70% train and 30% test data split
X_train, X_test, y_train, y_test = ____(X, y, ____=____)

print(f"{X_test.shape[0]} rows in test set vs. {X_train.shape[0]} in training set, {X_test.shape[1]} Features.")
Editar y ejecutar código