Comparison Cloud

This exercise will create a common visual for you to understand term frequency. Specifically, you will review the most frequent terms from among the positive and negative collapsed documents. Recall the TermDocumentMatrix all_tdm you created earlier. Instead of 1000 rental reviews the matrix contains 2 documents containing all reviews separated by the polarity() score.

It's usually easier to change the TDM to a matrix. From there you simply rename the columns. Remember that the colnames() function is called on the left side of the assignment operator as shown below.

colnames(OBJECT) <- c("COLUMN_NAME1", "COLUMN_NAME2")

Once done, you will reorder the matrix to see the most positive and negative words. Review these terms so you can answer the conclusion exercises!

Lastly, you'll visualize the terms using comparison.cloud().

Change the pre-loaded all_tdm to a matrix called all_tdm_m using as.matrix().
Use colnames() on all_tdm_m to declare c("positive", "negative").
Apply order() to all_tdm_m[,1] and set decreasing = TRUE.
Review the top 10 terms of the reordered TDM using pipe (%>%) then head() with n = 10.
Repeat the previous two steps with negative comments. Now you will order() by the second column, all_tdm_m[,2] and use decreasing = TRUE.
Review the 10 most negative terms indexing all_tdm_m by order_by_neg. Pipe this to head() with n = 10.

Fast & Dirty: Polarity scoring

Sentiment Analysis the tidytext Way

Visualizing Sentiment

Case study: Airbnb reviews

Exercise

Comparison Cloud

Instructions 1/2