Preparing data for market basket analysis
Throughout this course, you will typically encounter data in one of two formats: a pandas
DataFrame or a list of lists. DataFrame objects will be constructed by importing a csv file using pandas
. They will consist of a single column of data, where each element contains a string of items in a transaction, separated by a comma, as in the table below.
In this exercise, you will practice loading the data from a csv file and will prepare it for use as a list of lists. Note that the path to the grocery store dataset has been defined and is available to you as groceries_path
.
Transaction |
---|
'milk,bread,biscuit' |
'bread,milk,biscuit,cereal' |
… |
'tea,milk,coffee,cereal' |
This exercise is part of the course
Market Basket Analysis in Python
Exercise instructions
- Import the
pandas
package under the aliaspd
. - Use pandas to read the csv file at the path specified by
groceries_path
. - Select the
Transaction
column from the DataFrame and split each string of comma-separated items into a list. - Convert the DataFrame of transactions into a list of lists.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import pandas under the alias pd
import ____ as pd
# Load transactions from pandas
groceries = pd.____(groceries_path)
# Split transaction strings into lists
transactions = groceries['____'].apply(lambda t: t.split(','))
# Convert DataFrame column into list of strings
transactions = list(____)
# Print the list of transactions
print(transactions)