Train and testing transformations (I)
So far you have created scalers based on a column, and then applied the scaler to the same data that it was trained on. When creating machine learning models you will generally build your models on historic data (train set) and apply your model to new unseen data (test set). In these cases you will need to ensure that the same scaling is being applied to both the training and test data.
To do this in practice you train the scaler on the train set, and keep the trained scaler to apply it to the test set. You should never retrain a scaler on the test set.
For this exercise and the next, we split the so_numeric_df
DataFrame into train (so_train_numeric
) and test (so_test_numeric
) sets.
This exercise is part of the course
Feature Engineering for Machine Learning in Python
Exercise instructions
- Instantiate the
StandardScaler()
asSS_scaler
. - Fit the
StandardScaler
on theAge
column. - Transform the
Age
column in the test set (so_test_numeric
).
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import StandardScaler
from sklearn.preprocessing import StandardScaler
# Apply a standard scaler to the data
SS_scaler = ____
# Fit the standard scaler to the data
____
# Transform the test data using the fitted scaler
so_test_numeric['Age_ss'] = ____
print(so_test_numeric[['Age', 'Age_ss']].head())