Exercise

Word association

As expected, you see similar topics throughout the dendrogram. Switching back to positive comments, you decide to examine top phrases that appeared in the word clouds. You hope to find associated terms using the findAssocs()function from tm. You want to check for something surprising now that you have learned of long hours and a lack of work-life balance.

Instructions

100 XP

The amzn_pros_corp corpus has been cleaned using the custom functions like before.

  • Construct a TDM called amzn_p_tdm from amzn_pros_corp and control = list(tokenize = tokenizer).
  • Create amzn_p_m by converting amzn_p_tdm to a matrix.
  • Create amzn_p_freq by applying rowSums() to amzn_p_m.
  • Create term_frequency using sort() on amzn_p_freq along with the argument decreasing = TRUE.
  • Examine the first 5 bigrams using term_frequency[1:5].
  • You may be surprised to see "fast paced" as a top term because it could be a negative term related to "long hours". Look at the terms most associated with "fast paced". Use findAssocs() on amzn_p_tdm to examine "fast paced" with a 0.2 cutoff.