Standard scaling
Standard scaling transforms numerical features to have a mean of 0 and variance of 1. In this exercise, you will do standard scaling using StandardScaler() from sklearn. First, you will select only the relevant columns to apply scaling on, using a combination of filtering for numerical columns along with some knowledge of the columns. This filtering is already provided and will be done through the use of regular expressions, which allows for partial string matches. Then you will use fit_transform() to transform the relevant columns.
The pandas module is available as pd in your workspace and the sample DataFrame is loaded as df. Additionally, the hour column is already converted to a datetime, and StandardScaler from sklearn.preprocessing is available.
Deze oefening maakt deel uit van de cursus
Predicting CTR with Machine Learning in Python
Oefeninstructies
- Select the numerical columns, and filter the given
filter_colsusing.select_dtypes(). - Apply standard scaling to the relevant columns by first creating a
StandardScaler()and then using.fit_transform(). - Print the variance of the newly transformed columns using
.var().
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
# Get non-categorical columns, with a filter
num_df = df.____(include=['int', 'float'])
filter_cols = ['click', 'banner_pos', 'device_type',
'search_engine_type', 'product_type', 'advertiser_type']
new_df = num_df[num_df.columns[~num_df.columns.____(filter_cols)]]
num_cols = new_df.____
# Transform columns using StandardScaler
scaler = ____()
df[num_cols] = scaler.____(df[____])
# Print mean and variance of transformed columns
print(df[num_cols].mean())
print(df[num_cols].____)