One-hot encoding transaction data
Throughout the course, we will use a common pipeline for preprocessing data for use in market basket analysis. The first step is to import a pandas
DataFrame and select the column that contains transactions. Each transaction in the column will be a string that consists of a number of items, each separated by a comma. The next step is to use a lambda
function to split each transaction string into a list, thereby transforming the column into a list of lists.
In this exercise, you'll start with the list of lists from the grocery dataset, which is available to you as transactions
. You will then transform transactions
into a one-hot encoded DataFrame, where each column consists of TRUE
and FALSE
values that indicate whether an item was included in a transaction.
This is a part of the course
“Market Basket Analysis in Python”
Exercise instructions
- From the
mlxtend.preprocessing
, importTransactionEncoder
- Instantiate a transaction encoder and identify the unique items in
transactions
. - One-hot encode
transactions
in an array and assign its values toonehot
. - Convert the array into a
pandas
DataFrame using the item names as column headers.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the transaction encoder function from mlxtend
from ____.____ import ____
import pandas as pd
# Instantiate transaction encoder and identify unique items in transactions
encoder = TransactionEncoder().____(____)
# One-hot encode transactions
onehot = encoder.____(transactions)
# Convert one-hot encoded data to DataFrame
onehot = pd.DataFrame(____, columns = encoder.columns_)
# Print the one-hot encoded transaction dataset
print(onehot)