One-hot encoding transaction data
Throughout the course, we will use a common pipeline for preprocessing data for use in market basket analysis. The first step is to import a pandas
DataFrame and select the column that contains transactions. Each transaction in the column will be a string that consists of a number of items, each separated by a comma. The next step is to use a lambda
function to split each transaction string into a list, thereby transforming the column into a list of lists.
In this exercise, you'll start with the list of lists from the grocery dataset, which is available to you as transactions
. You will then transform transactions
into a one-hot encoded DataFrame, where each column consists of TRUE
and FALSE
values that indicate whether an item was included in a transaction.
This is a part of the course
“Market Basket Analysis in Python”
Exercise instructions
- From the
mlxtend.preprocessing
, importTransactionEncoder
- Instantiate a transaction encoder and identify the unique items in
transactions
. - One-hot encode
transactions
in an array and assign its values toonehot
. - Convert the array into a
pandas
DataFrame using the item names as column headers.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the transaction encoder function from mlxtend
from ____.____ import ____
import pandas as pd
# Instantiate transaction encoder and identify unique items in transactions
encoder = TransactionEncoder().____(____)
# One-hot encode transactions
onehot = encoder.____(transactions)
# Convert one-hot encoded data to DataFrame
onehot = pd.DataFrame(____, columns = encoder.columns_)
# Print the one-hot encoded transaction dataset
print(onehot)
This exercise is part of the course
Market Basket Analysis in Python
Explore association rules in market basket analysis with Python by bookstore data and creating movie recommendations.
In this chapter, you’ll learn the basics of Market Basket Analysis: association rules, metrics, and pruning. You’ll then apply these concepts to help a small grocery store improve its promotional and product placement efforts.
Exercise 1: What is market basket analysis?Exercise 2: The basics of market basket analysisExercise 3: Cross-selling productsExercise 4: Identifying association rulesExercise 5: Multiple antecedents and consequentsExercise 6: Preparing data for market basket analysisExercise 7: Generating association rulesExercise 8: The simplest metricExercise 9: One-hot encoding transaction dataExercise 10: Computing the support metricWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.