Session Ready
Exercise

Compare & contrast stacked bar chart

Another way to slice your text is to understand how much of the document(s) are made of positive or negative words. For example a restaurant review may have some positive aspects such as "the food was good" but then continue to add "the restaurant was dirty, the staff was rude and parking was awful." As a result, you may want to understand how much of a document is dedicated to positive vs negative language. In this example it would have a higher negative percentage compared to positive.

One method for doing so is to count() the positive and negative words then divide by the number of subjectivity words identified. In the restaurant review example, "good" would count as 1 positive and "dirty," "rude," and "awful" count as 3 negative terms. A simple calculation would lead you to believe the restaurant review is 25% positive and 75% negative since there were 4 subjectivity terms.

Start by performing the inner_join() on a unified tidy data frame containing 4 books, Agamemnon, Oz, Huck Finn, and Moby Dick. Just like the previous exercise you will use filter() and grepl().

To perform the count() you have to group the data by book and then sentiment. For example all the positive words for Agamemnon have to be grouped then tallied so that positive words from all books are not mixed. Luckily, you can pass multiple variables into count() directly.

Instructions 1/3
undefined XP
  • 1
  • 2
  • 3
  • Inner join all_books to the lexicon, nrc.
  • Filter to keep rows where sentiment contains "positive" or "negative". That is, use grepl() on the sentiment column, checking without the negation so that "positive|negative" are kept.
  • Count by book and sentiment.