Get startedGet started for free

One-hot encoding credit data

It's time to prepare the non-numeric columns so they can be added to your LogisticRegression() model.

Once the new columns have been created using one-hot encoding, you can concatenate them with the numeric columns to create a new data frame which will be used throughout the rest of the course for predicting probability of default.

Remember to only one-hot encode the non-numeric columns. Doing this to the numeric columns would create an incredibly wide data set!

The credit loan data, cr_loan_clean, has already been loaded in the workspace.

This exercise is part of the course

Credit Risk Modeling in Python

View Course

Exercise instructions

  • Create a data set for all the numeric columns called cred_num and one for the non-numeric columns called cred_str.
  • Use one-hot encoding on cred_str to create a new data set called cred_str_onehot.
  • Union cred_num with the new one-hot encoded data and store the results as cr_loan_prep.
  • Print the columns of the new data set.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create two data sets for numeric and non-numeric data
____ = ____.select_dtypes(exclude=['object'])
____ = ____.select_dtypes(include=['object'])

# One-hot encode the non-numeric columns
____ = pd.____(____)

# Union the one-hot encoded columns to the numeric ones
____ = pd.concat([____, ____], axis=1)

# Print the columns in the new data set
print(____.columns)
Edit and Run Code