Session Ready
Exercise

Scaled Comparison Cloud

Recall the "grade inflation" of polarity scores on the rental reviews? Sometimes, another way to uncover an insight is to scale the scores back to 0 then perform the corpus subset. This means some of the previously positive comments may become part of the negative subsection or vice versa since the mean is changed to 0. This exercise will help you scale the scores and then re-plot the comparison.cloud(). Removing the "grade inflation" can help provide additional insights.

Previously you applied polarity() to the bos_reviews$comments and created a comparison.cloud(). In this exercise you will scale() the outcome before creating the comparison.cloud(). See if this shows something different in the visual!

Since this is largely a review exercise, a lot of the code exists, just fill in the correct objects and parameters!

Instructions
100 XP
  • Review a section of the pre-loaded bos_pol$all while indexing [1:6,1:3].
  • Add a new column to called scaled_polarity with scale() applied to the polarity score column bos_pol$all$polarity.
  • For positive comments, subset() where the new column bos_reviews$scaled_polarity is greater than (>) zero.
  • For negative comments, subset() where the new column bos_reviews$scaled_polarity is less than (<) zero.
  • Create pos_terms using paste() on pos_comments.
  • Now create neg_terms with paste() on neg_comments.
  • Organize the collapsed documents, pos_terms and neg_terms documents into a single corpus called all_terms.
  • Following the usual tm workflow by nesting VectorSource() inside VCorpus() applied to all_terms.
  • Make the TermDocumentMatrix() using the all_corpus object. Note this is a TfIdf weighted TDM with basic cleaning functions.
  • Change all_tdm to all_tdm_m using as.matrix(). Then rename the columns in the existing code to "positive" and "negative".
  • Finally! apply comparison.cloud() to the matrix object all_tdm_m. Take notice of the new most frequent negative words. Maybe it will uncover an unknown insight!