Session Ready
Exercise

Divide & conquer: Using polarity for a comparison cloud

Now that you have seen how polarity can be used to divide a corpus, let's do it! This code will walk you through dividing a corpus based on sentiment so you can peer into the information in subsets instead of holistically.

Your R session has oz_pol which was created by applying polarity() to "The Wonderful Wizard of Oz."

For simplicity's sake, we created a simple custom function called pol_subsections() which will divide the corpus by polarity score. First, the function accepts a data frame with each row being a sentence or document of the corpus. The data frame is subset anywhere the polarity values are greater than or less than 0. Finally, the positive and negative sentences, non-zero polarities, are pasted with parameter collapse so that the terms are grouped into a single corpus. Lastly, the two documents are concatenated into a single vector of two distinct documents.

pol_subsections <- function(df) {
  x.pos <- subset(df$text, df$polarity > 0)
  x.neg <- subset(df$text, df$polarity < 0)
  x.pos <- paste(x.pos, collapse = " ")
  x.neg <- paste(x.neg, collapse = " ")
  all.terms <- c(x.pos, x.neg)
  return(all.terms)
}

At this point you have omitted the neutral sentences and want to focus on organizing the remaining text. In this exercise we use the %>% operator again to forward objects to functions. After some simple cleaning use comparison.cloud() to make the visual.

Instructions 1/3
undefined XP
  • 1
  • 2
  • 3
  • Extract the bits you need from oz_pol.
    • Call select(), declaring the first column text as text.var which is the raw text. The second column polarity should refer to the polarity scores polarity.
  • Now apply pol_subsections() to oz_df. Call the new object all_terms.
  • To create all_corpus apply VectorSource() to all_terms and then %>% to VCorpus().