Encode categorical and scale numerical variables
In this final step, you will perform one-hot encoding on the categorical variables and then scale the numerical columns. The pandas
library has been loaded for you as pd
, as well as the StandardScaler
module from the sklearn.preprocessing
module.
The raw telecom churn dataset telco_raw
has been loaded for you as a pandas
DataFrame, as well as the lists custid
, target
, categorical
, and numerical
with column names you have created in the previous exercise. You can familiarize yourself with the dataset by exploring it in the console.
This exercise is part of the course
Machine Learning for Marketing in Python
Exercise instructions
- Perform one-hot encoding on the categorical variables.
- Initialize a
StandardScaler
instance. - Fit and transform the
scaler
on the numerical columns. - Build a DataFrame from
scaled_numerical
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Perform one-hot encoding to categorical variables
telco_raw = pd.get_dummies(data = ___, columns = categorical, drop_first=True)
# Initialize StandardScaler instance
scaler = ___()
# Fit and transform the scaler on numerical columns
scaled_numerical = ___.fit_transform(telco_raw[___])
# Build a DataFrame from scaled_numerical
scaled_numerical = pd.DataFrame(___, columns=numerical)