Aan de slagGa gratis aan de slag

Cage match! Amazon vs. Google pro reviews

Amazon's positive reviews appear to mention bigrams such as "good benefits", while its negative reviews focus on bigrams such as "workload" and "work-life balance" issues.

In contrast, Google's positive reviews mention "great food", "perks", "smart people", and "fun culture", among other things. Google's negative reviews discuss "politics", "getting big", "bureaucracy", and "middle management".

You decide to make a pyramid plot lining up positive reviews for Amazon and Google so you can compare the differences between any shared bigrams.
We have preloaded a data frame, all_tdm_df, consisting of terms and corresponding AmazonPro, and GooglePro bigram frequencies. Using this data frame, you will identify the top 5 bigrams that are shared between the two corpora.

Deze oefening maakt deel uit van de cursus

Text Mining with Bag-of-Words in R

Cursus bekijken

Oefeninstructies

  • Create common_words from all_tdm_df using dplyr functions.
    • filter() on the AmazonPro column for nonzero values.
    • Likewise filter the GooglePro column for nonzero values.
    • Then mutate() a new column, diff which is the abs (absolute) difference between the term frequencies columns.
  • Pipe common_words into slice_max to create top5_df referencing the diff column and top 5 values. It will print to your console for review.
  • Create a pyramid.plot passing in top5_df$AmazonPro then top5_df$GooglePro and finally add labels with top5_df$terms.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Filter to words in common and create an absolute diff column
common_words <- all_tdm_df %>% 
  filter(
    ___ != 0,
    ___ != 0
  ) %>%
  ___(diff = ___(___ - ___))

# Extract top 5 common bigrams
(top5_df <- common_words %>% ___(___, n = ___))

# Create the pyramid plot
pyramid.plot(top5_df$___, top5_df$___, 
             labels = top5_df$___, gap = 12, 
             top.labels = c("Amzn", "Pro Words", "Goog"), 
             main = "Words in Common", unit = NULL)
Code bewerken en uitvoeren