Get startedGet started for free

Cage match! Amazon vs. Google pro reviews

Amazon's positive reviews appear to mention bigrams such as "good benefits", while its negative reviews focus on bigrams such as "workload" and "work-life balance" issues.

In contrast, Google's positive reviews mention "great food", "perks", "smart people", and "fun culture", among other things. Google's negative reviews discuss "politics", "getting big", "bureaucracy", and "middle management".

You decide to make a pyramid plot lining up positive reviews for Amazon and Google so you can compare the differences between any shared bigrams.
We have preloaded a data frame, all_tdm_df, consisting of terms and corresponding AmazonPro, and GooglePro bigram frequencies. Using this data frame, you will identify the top 5 bigrams that are shared between the two corpora.

This exercise is part of the course

Text Mining with Bag-of-Words in R

View Course

Exercise instructions

  • Create common_words from all_tdm_df using dplyr functions.
    • filter() on the AmazonPro column for nonzero values.
    • Likewise filter the GooglePro column for nonzero values.
    • Then mutate() a new column, diff which is the abs (absolute) difference between the term frequencies columns.
  • Pipe common_words into slice_max to create top5_df referencing the diff column and top 5 values. It will print to your console for review.
  • Create a pyramid.plot passing in top5_df$AmazonPro then top5_df$GooglePro and finally add labels with top5_df$terms.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Filter to words in common and create an absolute diff column
common_words <- all_tdm_df %>% 
  filter(
    ___ != 0,
    ___ != 0
  ) %>%
  ___(diff = ___(___ - ___))

# Extract top 5 common bigrams
(top5_df <- common_words %>% ___(___, n = ___))

# Create the pyramid plot
pyramid.plot(top5_df$___, top5_df$___, 
             labels = top5_df$___, gap = 12, 
             top.labels = c("Amzn", "Pro Words", "Goog"), 
             main = "Words in Common", unit = NULL)
Edit and Run Code