ComenzarEmpieza gratis

Word association

As expected, you see similar topics throughout the dendrogram. Switching back to positive comments, you decide to examine top phrases that appeared in the word clouds. You hope to find associated terms using the findAssocs()function from tm. You want to check for something surprising now that you have learned of long hours and a lack of work-life balance.

Este ejercicio forma parte del curso

Text Mining with Bag-of-Words in R

Ver curso

Instrucciones del ejercicio

The amzn_pros_corp corpus has been cleaned using the custom functions like before.

  • Construct a TDM called amzn_p_tdm from amzn_pros_corp and control = list(tokenize = tokenizer).
  • Create amzn_p_m by converting amzn_p_tdm to a matrix.
  • Create amzn_p_freq by applying rowSums() to amzn_p_m.
  • Create term_frequency using sort() on amzn_p_freq along with the argument decreasing = TRUE.
  • Examine the first 5 bigrams using term_frequency[1:5].
  • You may be surprised to see "fast paced" as a top term because it could be a negative term related to "long hours". Look at the terms most associated with "fast paced". Use findAssocs() on amzn_p_tdm to examine "fast paced" with a 0.2 cutoff.

Ejercicio interactivo práctico

Prueba este ejercicio completando el código de muestra.

# Create amzn_p_tdm
___ <- ___(
  ___,
  ___
)

# Create amzn_p_m
___ <- ___

# Create amzn_p_freq
___ <- ___

# Create term_frequency
___ <- ___

# Print the 5 most common terms
___

# Find associations with fast-paced
___
Editar y ejecutar código