Separate numerical and categorical columns
In the last exercise, you have explored the dataset characteristics and are ready to do some data pre-processing. You will now separate categorical and numerical variables from the telco_raw
DataFrame with a customized categorical vs. numerical unique value count threshold. The pandas
module has been loaded for you as pd
.
The raw telecom churn dataset telco_raw
has been loaded for you as a pandas
DataFrame. You can familiarize with the dataset by exploring it in the console.
Este exercício faz parte do curso
Machine Learning for Marketing in Python
Instruções do exercício
- Store
customerID
andChurn
column names. - Assign to
categorical
the column names that have less than 5 unique values. - Remove
target
from the list. - Assign to
numerical
all column names that are not in thecustid
,target
andcategorical
.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Store customerID and Churn column names
custid = ['___']
target = ['___']
# Store categorical column names
categorical = telco_raw.___()[telco_raw.nunique() < ___].keys().tolist()
# Remove target from the list of categorical variables
categorical.remove(___[0])
# Store numerical column names
numerical = [x for x in telco_raw.___ if x not in custid + ___ + categorical]