One-hot encoding transaction data
Throughout the course, we will use a common pipeline for preprocessing data for use in market basket analysis. The first step is to import a pandas DataFrame and select the column that contains transactions. Each transaction in the column will be a string that consists of a number of items, each separated by a comma. The next step is to use a lambda function to split each transaction string into a list, thereby transforming the column into a list of lists.
In this exercise, you'll start with the list of lists from the grocery dataset, which is available to you as transactions. You will then transform transactions into a one-hot encoded DataFrame, where each column consists of TRUE and FALSE values that indicate whether an item was included in a transaction.
Diese Übung ist Teil des Kurses
Market Basket Analysis in Python
Anleitung zur Übung
- From the
mlxtend.preprocessing, importTransactionEncoder - Instantiate a transaction encoder and identify the unique items in
transactions. - One-hot encode
transactionsin an array and assign its values toonehot. - Convert the array into a
pandasDataFrame using the item names as column headers.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Import the transaction encoder function from mlxtend
from ____.____ import ____
import pandas as pd
# Instantiate transaction encoder and identify unique items in transactions
encoder = TransactionEncoder().____(____)
# One-hot encode transactions
onehot = encoder.____(transactions)
# Convert one-hot encoded data to DataFrame
onehot = pd.DataFrame(____, columns = encoder.columns_)
# Print the one-hot encoded transaction dataset
print(onehot)