Ordinal encoding of a DataFrame
Categorical features can be encoded using two techniques namely, one-hot encoding and ordinal encoding. In one-hot encoding, each category becomes a column and the respective category column for each row is 1 and the others 0. In ordinal encoding, the categories are mapped to integer values starting from 0 to number of categories.
In this exercise, you will loop over all the columns in the users
DataFrame to ordinally encode the categories. You will also store an encoder for each column in a dictionary ordinal_enc_dict
so that the encoded columns can be converted back to the original categories.
This exercise is part of the course
Dealing with Missing Data in Python
Exercise instructions
- Define an empty dictionary
ordinal_enc_dict
. - Create an Ordinal Encoder object for each column.
- Select non-null values of column in users and encode them.
- Assign back the encoded values to non-null values of each column (
col_name
) in users.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create an empty dictionary ordinal_enc_dict
ordinal_enc_dict = ___
for col_name in users:
# Create Ordinal encoder for col
ordinal_enc_dict[col_name] = ___
col = users[col_name]
# Select non-null values of col
col_not_null = ___
reshaped_vals = col_not_null.values.reshape(-1, 1)
encoded_vals = ordinal_enc_dict[col_name].fit_transform(reshaped_vals)
# Select the non-null values for the column col_name in users and store the encoded values
users.loc[___, ___] = np.squeeze(encoded_vals)