Exercise

Cage match! Amazon vs. Google pro reviews

Amazon's positive reviews appear to mention bigrams such as "good benefits", while its negative reviews focus on bigrams such as "workload" and "work-life balance" issues.

In contrast, Google's positive reviews mention "great food", "perks", "smart people", and "fun culture", among other things. Google's negative reviews discuss "politics", "getting big", "bureaucracy", and "middle management".

You decide to make a pyramid plot lining up positive reviews for Amazon and Google so you can compare the differences between any shared bigrams.
We have preloaded a data frame, all_tdm_df, consisting of terms and corresponding AmazonPro, and GooglePro bigram frequencies. Using this data frame, you will identify the top 5 bigrams that are shared between the two corpora.

Instructions

100 XP
  • Create common_words from all_tdm_df using dplyr functions.
    • filter() on the AmazonPro column for nonzero values.
    • Likewise filter the GooglePro column for nonzero values.
    • Then mutate() a new column, diff which is the abs (absolute) difference between the term frequencies columns.
  • Pipe common_words into slice_max to create top5_df referencing the diff column and top 5 values. It will print to your console for review.
  • Create a pyramid.plot passing in top5_df$AmazonPro then top5_df$GooglePro and finally add labels with top5_df$terms.