Get startedGet started for free

AFINN & NRC methodologies in more detail

1. AFINN & NRC inner joins

Now we transition to performing inner joins with the AFINN and NRC subjectivity lexicons.

2. AFINN

The AFINN lexicon contains words labeled by Finn Årup Nielsen, a Danish researcher. Rather than positive or negative, now the subjectivity words are labeled with numeric values between minus five and five. Keep in mind there are no neutral or zero-valued words, only minus one to minus five and one to five.

3. NRC

In contrast, the NRC lexicon was created by Saif Mohammed with labels done using a crowdsourcing platform. The NRC lexicon has words in ten classes, positive and negative, as well as Plutchik’s primary eight.

4. Huckleberry Finn

Using the AFINN lexicon you will perform an inner join with a book called The Adventures of Huckleberry Finn. This is an American classic and largely considered a fairly humorous account of a young boy, Huck, searching for freedom and adventure. The book has been arranged in a tidy format so that every single word is a row. The document column represents the line of the book the word was found on.

5. Huck Finn joined to AFINN

As you perform the inner join with the words and the AFINN lexicon the corresponding values between minus five and five will be retained.

6. Using summarize()

You also learn the aggregate function in this section. Once the words have been scored with AFINN and the inner join, you will sum the values by book line. For example, suppose a line contains two scored words. The first is valued as negative three and the second at one. Thus, the total score will be negative two. In this example, line twenty-two has negative words like "judge" and "took". In this section you will simply add the corresponding values by the group which is the "document" column representing line of the book.

7. Using filter()

Next you will learn the filter() function. This allows you to explore the results in a granular way so you can understand what you accomplished. The filter() function is similar to subset(). In the tidyverse it is preferred because it has a simpler syntax. It works by first passing in the data object and then the condition you want to filter by.

8. Plutchik & NRC

In the next exercise you switch to the NRC lexicon. Remember the NRC lexicon has words as positive/negative and eight primary emotions. Specifically the emotions include joy, trust, fear, surprise, sadness, disgust, anger and anticipation.

9. The Wonderful Wizard of NRC

In this chapter you join the NRC lexicon with a The Wonderful Wizard of Oz book. It's about a girl taken to a magical kingdom full of both happy and dramatic moments as she works to get home. This section of chapter two is meant to introduce you to the different lexicons and performing the inner joins a few times. In chapter three you will make visuals based on all three lexicons, Bing, AFINN and NRC. So you will get a chance to do many more inner joins while making more interesting visuals.

10. %in% operator

In the Oz exercise we introduce the match operator. This is a matching function that returns a True or False if values are shared. In this example you have x as text, mining and python and y as text, tm, qdap, R and mining. If you check “is x in y?” then match returns True, True and False because text and mining are in X and y but python is not. If you reverse and check y in x you get True, False, False, False, and True. To be candid I always mix up the order of x in y or vice versa, so when you code be sure to check!

11. Let's practice!

Get to it!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.