IniziaInizia gratis

Converting categorical variables

Because sklearn requires numerical features as inputs for models, it is important to encode categorical variables into numerical ones. The most common technique, called "one-hot encoding", is straightforward but has high memory consumption. To that end, you will use the technique of hashing, which maps categorical inputs into numerical ones, for each categorical column.

The pandas module is available as pd in your workspace and the sample DataFrame is loaded as df.

Questo esercizio fa parte del corso

Predicting CTR with Machine Learning in Python

Visualizza il corso

Istruzioni dell'esercizio

  • Select the categorical columns by filtering for data type.
  • Apply a hash function over each of the categorical columns.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Get categorical columns
categorical_cols = df.____(
  include = [____]).columns.tolist()
print("Categorical columns: ")
print(categorical_cols)

# Iterate over categorical columns and apply hash function
for col in ____:
	df[col] = df[col].____(lambda x: ____(x))

# Print examples of new output
print(df.head(5))
Modifica ed esegui il codice