Categorical encodings
Your colleague has converted the columns in the credit dataset to numeric values using LabelEncoder()
. He left one out: credit_history
, which records the credit history of the applicant. You want to create two versions of the dataset. One will use LabelEncoder()
and another one-hot encoding, for comparison purposes. The feature matrix is available to you as credit
. You have LabelEncoder()
preloaded and pandas
as pd
.
This exercise is part of the course
Designing Machine Learning Workflows in Python
Exercise instructions
- Encode
credit_history
usingLabelEncoder()
. - Concatenate the result to the original frame.
- Create a new data frame by concatenating the 1-hot encoding dummies to the original frame.
- Confirm that 1-hot encoding produces more columns than label encoding.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create numeric encoding for credit_history
credit_history_num = ____.____(
credit[____])
# Create a new feature matrix including the numeric encoding
X_num = pd.concat([X, pd.Series(____)], ____)
# Create new feature matrix with dummies for credit_history
X_hot = pd.concat(
[X, ____.____(credit[____])], ____)
# Compare the number of features of the resulting DataFrames
print(X_hot.shape[____] > X_num.shape[____])