Ordinal encoding of a categorical column
Imputing categorical values involves a few additional steps over imputing numerical values. You need to first convert them to numerical values as statistical operations cannot be performed on strings.
You will use the user profile dataset which contains customer preferences and choices recorded by a restaurant. It contains only categorical features. In this exercise, you will convert the categorical column 'ambience'
to a numerical one using OrdinalEncoder
from sklearn
. The DataFrame has been loaded for you as users
. The function OrdinalEncoder()
has also been loaded.
The head()
and tail()
of users
DataFrame has been printed for you.
This exercise is part of the course
Dealing with Missing Data in Python
Exercise instructions
- Create the ordinal encoder object and assign it to
ambience_ord_enc
. - Select the non-missing values of the
'ambience'
column inusers
. - Reshape
ambience_not_null
to shape(-1, 1)
. - Replace the non-missing values of
ambience
with its encoded values.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Set col_name to 'ambience'
col_name = 'ambience'
# Create Ordinal encoder
ambience_ord_enc = ___
# Select non-null values of ambience column in users
ambience = users[col_name]
ambience_not_null = ___
# Reshape ambience_not_null to shape (-1, 1)
reshaped_vals = ___
# Select the non-null values for the column col_name in users and store the encoded values
encoded_vals = ambience_ord_enc.fit_transform(reshaped_vals)
users.loc[___, col_name] = np.squeeze(encoded_vals)