Word association

As expected, you see similar topics throughout the dendrogram. Switching back to positive comments, you decide to examine top phrases that appeared in the word clouds. You hope to find associated terms using the findAssocs()function from tm. You want to check for something surprising now that you have learned of long hours and a lack of work-life balance.

The amzn_pros_corp corpus has been cleaned using the custom functions like before.

Construct a TDM called amzn_p_tdm from amzn_pros_corp and control = list(tokenize = tokenizer).
Create amzn_p_m by converting amzn_p_tdm to a matrix.
Create amzn_p_freq by applying rowSums() to amzn_p_m.
Create term_frequency using sort() on amzn_p_freq along with the argument decreasing = TRUE.
Examine the first 5 bigrams using term_frequency[1:5].
You may be surprised to see "fast paced" as a top term because it could be a negative term related to "long hours". Look at the terms most associated with "fast paced". Use findAssocs() on amzn_p_tdm to examine "fast paced" with a 0.2 cutoff.

Jumping into Text Mining with Bag-of-Words

Word Clouds and More Interesting Visuals

Adding to Your TM Skills

Battle of the Tech Giants for Talent

Exercise

Word association

Instructions