AFINN: I'm your Huckleberry
Now we transition to the AFINN lexicon. The AFINN lexicon has numeric values from 5 to -5, not just positive or negative. Unlike the Bing lexicon's sentiment
, the AFINN lexicon's sentiment score column is called value
.
As before, you apply inner_join()
then count()
. Next, to sum the scores of each line, we use dplyr
's group_by()
and summarize()
functions. The group_by()
function takes an existing data frame and converts it into a grouped data frame where operations are performed "by group". Then, the summarize()
function lets you calculate a value for each group in your data frame using a function that aggregates data, like sum()
or mean()
. So, in our case we can do something like
data_frame %>%
group_by(book_line) %>%
summarize(total_value = sum(book_line))
In the tidy version of Huckleberry Finn, line 9703 contains words "best", "ever", "fun", "life" and "spirit". "best" and "fun" have AFINN scores of 3 and 4 respectively. After aggregating, line 9703 will have a total score of 7.
In the tidyverse, filter()
is preferred to subset()
because it combines the functionality of subset()
with simpler syntax. Here is an example that filter()
s data_frame
where some value in column1
is equal to 24
. Notice the column name is not in quotes.
filter(data_frame, column1 == 24)
The afinn
object contains the AFINN lexicon. The huck
object is a tidy version of Mark Twain's Adventures of Huckleberry Finn for analysis.
Line 5400 is All the loafers looked glad; I reckoned they was used to having fun out of Boggs. Stopwords and punctuation have already been removed in the dataset.
This exercise is part of the course
Sentiment Analysis in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# See abbreviated line 5400
huck %>% filter(line == 5400)
# What are the scores of the sentiment words?
afinn %>% filter(word %in% c("fun", "glad"))
huck_afinn <- huck %>%
# Inner Join to AFINN lexicon
inner_join(___, by = c("___" = "___")) %>%
# Count by value and line
___(___, ___)