Organizing transcribed phone call data
We're almost ready to build a text classifier. But right now, all of our transcribed text data is in two lists, pre_purchase_text
and post_purchase_text
.
To organize it better for building a text classifier as well as for future use, we'll put it together into a pandas DataFrame.
To start we'll import pandas
as pd
then we'll create a post purchase dataframe, post_purchase_df
using pd.DataFrame()
.
We'll pass pd.DataFrame()
a dictionary containing a "label"
key with a value of "post_purchase"
and a "text"
key with a value of our post_purchase_text
list.
We'll do the same for pre_purchase_df
except with pre_purchase_text
.
To have all the data in one place, we'll use pd.concat()
and pass it the pre and post purchase DataFrames.
Diese Übung ist Teil des Kurses
Spoken Language Processing in Python
Anleitung zur Übung
- Create
post_purchase_df
using thepost_purchase_text
list. - Create
pre_purchase_df
using thepre_purchase_text
list. - Combine the two DataFrames using
pd.concat()
.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
import pandas as pd
# Make dataframes with the text
post_purchase_df = pd.DataFrame({"label": "post_purchase",
"text": ____})
pre_purchase_df = pd.____({"label": "pre_purchase",
"text": ____})
# Combine DataFrames
df = pd.____([post_purchase_df, pre_purchase_df])
# Print the combined DataFrame
print(df.head())