Get Started

One-hot encoding transaction data

Throughout the course, we will use a common pipeline for preprocessing data for use in market basket analysis. The first step is to import a pandas DataFrame and select the column that contains transactions. Each transaction in the column will be a string that consists of a number of items, each separated by a comma. The next step is to use a lambda function to split each transaction string into a list, thereby transforming the column into a list of lists.

In this exercise, you'll start with the list of lists from the grocery dataset, which is available to you as transactions. You will then transform transactions into a one-hot encoded DataFrame, where each column consists of TRUE and FALSE values that indicate whether an item was included in a transaction.

This is a part of the course

“Market Basket Analysis in Python”

View Course

Exercise instructions

  • From the mlxtend.preprocessing, import TransactionEncoder
  • Instantiate a transaction encoder and identify the unique items in transactions.
  • One-hot encode transactions in an array and assign its values to onehot.
  • Convert the array into a pandas DataFrame using the item names as column headers.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the transaction encoder function from mlxtend
from ____.____ import ____
import pandas as pd

# Instantiate transaction encoder and identify unique items in transactions
encoder = TransactionEncoder().____(____)

# One-hot encode transactions
onehot = encoder.____(transactions)

# Convert one-hot encoded data to DataFrame
onehot = pd.DataFrame(____, columns = encoder.columns_)

# Print the one-hot encoded transaction dataset
print(onehot)
Edit and Run Code