Get startedGet started for free

Word association

As expected, you see similar topics throughout the dendrogram. Switching back to positive comments, you decide to examine top phrases that appeared in the word clouds. You hope to find associated terms using the findAssocs()function from tm. You want to check for something surprising now that you have learned of long hours and a lack of work-life balance.

This exercise is part of the course

Text Mining with Bag-of-Words in R

View Course

Exercise instructions

The amzn_pros_corp corpus has been cleaned using the custom functions like before.

  • Construct a TDM called amzn_p_tdm from amzn_pros_corp and control = list(tokenize = tokenizer).
  • Create amzn_p_m by converting amzn_p_tdm to a matrix.
  • Create amzn_p_freq by applying rowSums() to amzn_p_m.
  • Create term_frequency using sort() on amzn_p_freq along with the argument decreasing = TRUE.
  • Examine the first 5 bigrams using term_frequency[1:5].
  • You may be surprised to see "fast paced" as a top term because it could be a negative term related to "long hours". Look at the terms most associated with "fast paced". Use findAssocs() on amzn_p_tdm to examine "fast paced" with a 0.2 cutoff.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create amzn_p_tdm
___ <- ___(
  ___,
  ___
)

# Create amzn_p_m
___ <- ___

# Create amzn_p_freq
___ <- ___

# Create term_frequency
___ <- ___

# Print the 5 most common terms
___

# Find associations with fast-paced
___
Edit and Run Code