Word association
As expected, you see similar topics throughout the dendrogram. Switching back to positive comments, you decide to examine top phrases that appeared in the word clouds. You hope to find associated terms using the findAssocs()function from tm. You want to check for something surprising now that you have learned of long hours and a lack of work-life balance.
Cet exercice fait partie du cours
Text Mining with Bag-of-Words in R
Instructions
The amzn_pros_corp corpus has been cleaned using the custom functions like before.
- Construct a TDM called
amzn_p_tdmfromamzn_pros_corpandcontrol = list(tokenize = tokenizer). - Create
amzn_p_mby convertingamzn_p_tdmto a matrix. - Create
amzn_p_freqby applyingrowSums()toamzn_p_m. - Create
term_frequencyusingsort()onamzn_p_freqalong with the argumentdecreasing = TRUE. - Examine the first 5 bigrams using
term_frequency[1:5]. - You may be surprised to see "fast paced" as a top term because it could be a negative term related to "long hours". Look at the terms most associated with "fast paced". Use
findAssocs()onamzn_p_tdmto examine"fast paced"with a0.2cutoff.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Create amzn_p_tdm
___ <- ___(
___,
___
)
# Create amzn_p_m
___ <- ___
# Create amzn_p_freq
___ <- ___
# Create term_frequency
___ <- ___
# Print the 5 most common terms
___
# Find associations with fast-paced
___