Encode categorical and scale numerical variables
In this final step, you will perform one-hot encoding on the categorical variables and then scale the numerical columns. The pandas
library has been loaded for you as pd
, as well as the StandardScaler
module from the sklearn.preprocessing
module.
The raw telecom churn dataset telco_raw
has been loaded for you as a pandas
DataFrame, as well as the lists custid
, target
, categorical
, and numerical
with column names you have created in the previous exercise. You can familiarize yourself with the dataset by exploring it in the console.
This is a part of the course
“Machine Learning for Marketing in Python”
Exercise instructions
- Perform one-hot encoding on the categorical variables.
- Initialize a
StandardScaler
instance. - Fit and transform the
scaler
on the numerical columns. - Build a DataFrame from
scaled_numerical
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Perform one-hot encoding to categorical variables
telco_raw = pd.get_dummies(data = ___, columns = categorical, drop_first=True)
# Initialize StandardScaler instance
scaler = ___()
# Fit and transform the scaler on numerical columns
scaled_numerical = ___.fit_transform(telco_raw[___])
# Build a DataFrame from scaled_numerical
scaled_numerical = pd.DataFrame(___, columns=numerical)