Compare & contrast stacked bar chart
Another way to slice your text is to understand how much of the document(s) are made of positive or negative words. For example a restaurant review may have some positive aspects such as "the food was good" but then continue to add "the restaurant was dirty, the staff was rude and parking was awful." As a result, you may want to understand how much of a document is dedicated to positive vs negative language. In this example it would have a higher negative percentage compared to positive.
One method for doing so is to count()
the positive and negative words then divide by the number of subjectivity words identified. In the restaurant review example, "good" would count as 1 positive and "dirty," "rude," and "awful" count as 3 negative terms. A simple calculation would lead you to believe the restaurant review is 25% positive and 75% negative since there were 4 subjectivity terms.
Start by performing the inner_join()
on a unified tidy data frame containing 4 books, Agamemnon, Oz, Huck Finn, and Moby Dick. Just like the previous exercise you will use filter()
and grepl()
.
To perform the count()
you have to group the data by book and then sentiment. For example all the positive words for Agamemnon have to be grouped then tallied so that positive words from all books are not mixed. Luckily, you can pass multiple variables into count()
directly.
This exercise is part of the course
Sentiment Analysis in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Review tail of all_books
tail(all_books)
# Count by book & sentiment
books_sent_count <- all_books %>%
# Inner join to nrc lexicon
___(___, by = c("term" = "word")) %>%
# Keep only positive or negative
___(__("___", sentiment)) %>%
# Count by book and by sentiment
___(___, ___)
# Review entire object
books_sent_count