Get startedGet started for free

Sentiment analysis

1. Sentiment analysis

Hello again - and welcome to the last chapter of introduction to NLP in R. In Chapter 4, I want to introduce you to a few additional analysis techniques that are essential tools for text analysis.

2. Sentiment analysis

As this is an introduction course, there is no way we could go four chapters without looking at sentiment analysis. Sentiment analysis is used to assess the subjective information in text. It could be used to understand how positive or negative a text is, or if the words used elicit some form of emotion. To perform sentiment analysis, you start with a dictionary of words that have a predefined value or score. For example, abandon might be related to fear, while accomplish might be related to joy.

3. Tidytext sentiments

In tidytext, the sentiments dataset contains 3 such dictionaries for us to use. Each word in the dictionary contains either a sentiment, such as trust or fear, or a score. We will explore examples of using both for sentiment analysis.

4. 3 lexicons

Let's look at these three different dictionaries, or lexicons. One contains scores from -5 to 5 describing how positive or negative each word is, one is just for binary labels of positive or negative, and the last one labels each word into categories such as fear or joy. To access the data, we can use the get_sentiments function with the name of the lexicon of interest.

5. Prepare your data.

Before we use these lexicons, we need to prepare our text. Essentially, all we need is for the words of our data to be tokens, which is something we have done several times. In this example, we read the words of Animal Farm. Note that we have not performed stemming, as words might create different feelings then just their stem.

6. The afinn lexicon

Once we have tokens, we can join the words with the sentiment from the sentiments dataset by using an inner join. The results are a new tibble with each word and its sentiment. Since the AFINN lexicon has scores -5 to 5, we see the scores as a column.

7. afinn continued

We can take this a step further, by grouping the sentiments by chapter, and summarizing the overall score. This results makes a lot of sense, as chapter 7 of Animal Farm is all about a cold winter in which the animals struggle to rebuild their farm.

8. The bing lexicon

The bing lexicon is similar to AFINN, but this time words are labeled as strictly positive or negative. Instead of summarizing the scores, we just need to count the words used. First we find the total words used by chapter. Next, we count how many negative words were used. Here we are also filtering to just the negative words used. And finally, we will append a column representing the proportion of negative words used. We find similar results, with chapter 7 containing the highest proportion of negative words used at almost 12%.

9. The nrc lexicon

Let's look at what the nrc lexicon offers. So still some positive and negative words, but also fear, anger, trust, and others. We can use this lexicon to see if certain emotions might be in our text.

10. nrc continued

We know that Animal Farm is about animals overthrowing their human rulers, let's see what words related to fear are in the text. We can use an inner join to find only the words that are labeled as fear, and then count these words. No surprises here, rebellion, death, and gun are the common words related to fear.

11. Sentiment time.

Let's explore sentiment analysis.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.