AFINN: I'm your Huckleberry
Now we transition to the AFINN lexicon. The AFINN lexicon has numeric values from 5 to -5, not just positive or negative. Unlike the Bing lexicon's sentiment, the AFINN lexicon's sentiment score column is called value.
As before, you apply inner_join() then count(). Next, to sum the scores of each line, we use dplyr's group_by() and summarize() functions. The group_by() function takes an existing data frame and converts it into a grouped data frame where operations are performed "by group". Then, the summarize() function lets you calculate a value for each group in your data frame using a function that aggregates data, like sum() or mean(). So, in our case we can do something like
data_frame %>%
group_by(book_line) %>%
summarize(total_value = sum(book_line))
In the tidy version of Huckleberry Finn, line 9703 contains words "best", "ever", "fun", "life" and "spirit". "best" and "fun" have AFINN scores of 3 and 4 respectively. After aggregating, line 9703 will have a total score of 7.
In the tidyverse, filter() is preferred to subset() because it combines the functionality of subset() with simpler syntax. Here is an example that filter()s data_frame where some value in column1 is equal to 24. Notice the column name is not in quotes.
filter(data_frame, column1 == 24)
The afinn object contains the AFINN lexicon. The huck object is a tidy version of Mark Twain's Adventures of Huckleberry Finn for analysis.
Line 5400 is All the loafers looked glad; I reckoned they was used to having fun out of Boggs. Stopwords and punctuation have already been removed in the dataset.
Deze oefening maakt deel uit van de cursus
Sentiment Analysis in R
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
# See abbreviated line 5400
huck %>% filter(line == 5400)
# What are the scores of the sentiment words?
afinn %>% filter(word %in% c("fun", "glad"))
huck_afinn <- huck %>%
# Inner Join to AFINN lexicon
inner_join(___, by = c("___" = "___")) %>%
# Count by value and line
___(___, ___)