Aan de slagGa gratis aan de slag

Train and testing transformations (I)

So far you have created scalers based on a column, and then applied the scaler to the same data that it was trained on. When creating machine learning models you will generally build your models on historic data (train set) and apply your model to new unseen data (test set). In these cases you will need to ensure that the same scaling is being applied to both the training and test data.
To do this in practice you train the scaler on the train set, and keep the trained scaler to apply it to the test set. You should never retrain a scaler on the test set.

For this exercise and the next, we split the so_numeric_df DataFrame into train (so_train_numeric) and test (so_test_numeric) sets.

Deze oefening maakt deel uit van de cursus

Feature Engineering for Machine Learning in Python

Cursus bekijken

Oefeninstructies

  • Instantiate the StandardScaler() as SS_scaler.
  • Fit the StandardScaler on the Age column.
  • Transform the Age column in the test set (so_test_numeric).

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Import StandardScaler
from sklearn.preprocessing import StandardScaler

# Apply a standard scaler to the data
SS_scaler = ____

# Fit the standard scaler to the data
____

# Transform the test data using the fitted scaler
so_test_numeric['Age_ss'] = ____
print(so_test_numeric[['Age', 'Age_ss']].head())
Code bewerken en uitvoeren