LoslegenKostenlos loslegen

Categorical encodings

Your colleague has converted the columns in the credit dataset to numeric values using LabelEncoder(). He left one out: credit_history, which records the credit history of the applicant. You want to create two versions of the dataset. One will use LabelEncoder() and another one-hot encoding, for comparison purposes. The feature matrix is available to you as credit. You have LabelEncoder() preloaded and pandas as pd.

Diese Übung ist Teil des Kurses

Designing Machine Learning Workflows in Python

Kurs anzeigen

Anleitung zur Übung

  • Encode credit_history using LabelEncoder().
  • Concatenate the result to the original frame.
  • Create a new data frame by concatenating the 1-hot encoding dummies to the original frame.
  • Confirm that 1-hot encoding produces more columns than label encoding.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Create numeric encoding for credit_history
credit_history_num = ____.____(
  credit[____])

# Create a new feature matrix including the numeric encoding
X_num = pd.concat([X, pd.Series(____)], ____)

# Create new feature matrix with dummies for credit_history
X_hot = pd.concat(
  [X, ____.____(credit[____])], ____)

# Compare the number of features of the resulting DataFrames
print(X_hot.shape[____] > X_num.shape[____])
Code bearbeiten und ausführen