Cage match! Amazon vs. Google pro reviews
Amazon's positive reviews appear to mention bigrams such as "good benefits", while its negative reviews focus on bigrams such as "workload" and "work-life balance" issues.
In contrast, Google's positive reviews mention "great food", "perks", "smart people", and "fun culture", among other things. Google's negative reviews discuss "politics", "getting big", "bureaucracy", and "middle management".
You decide to make a pyramid plot lining up positive reviews for Amazon and Google so you can compare the differences between any shared bigrams.
We have preloaded a data frame, all_tdm_df, consisting of terms and corresponding AmazonPro, and GooglePro bigram frequencies. Using this data frame, you will identify the top 5 bigrams that are shared between the two corpora.
This exercise is part of the course
Text Mining with Bag-of-Words in R
Exercise instructions
- Create
common_wordsfromall_tdm_dfusingdplyrfunctions.filter()on theAmazonProcolumn for nonzero values.- Likewise filter the
GoogleProcolumn for nonzero values. - Then
mutate()a new column,diffwhich is theabs(absolute) difference between the term frequencies columns.
- Pipe
common_wordsintoslice_maxto createtop5_dfreferencing thediffcolumn and top5values. It will print to your console for review. - Create a
pyramid.plotpassing intop5_df$AmazonProthentop5_df$GoogleProand finally add labels withtop5_df$terms.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Filter to words in common and create an absolute diff column
common_words <- all_tdm_df %>%
filter(
___ != 0,
___ != 0
) %>%
___(diff = ___(___ - ___))
# Extract top 5 common bigrams
(top5_df <- common_words %>% ___(___, n = ___))
# Create the pyramid plot
pyramid.plot(top5_df$___, top5_df$___,
labels = top5_df$___, gap = 12,
top.labels = c("Amzn", "Pro Words", "Goog"),
main = "Words in Common", unit = NULL)