AFINN: I'm your Huckleberry
Now we transition to the AFINN lexicon. The AFINN lexicon has numeric values from 5 to -5, not just positive or negative. Unlike the Bing lexicon's sentiment, the AFINN lexicon's sentiment score column is called value.
As before, you apply inner_join() then count(). Next, to sum the scores of each line, we use dplyr's group_by() and summarize() functions. The group_by() function takes an existing data frame and converts it into a grouped data frame where operations are performed "by group". Then, the summarize() function lets you calculate a value for each group in your data frame using a function that aggregates data, like sum() or mean(). So, in our case we can do something like
data_frame %>%
group_by(book_line) %>%
summarize(total_value = sum(book_line))
In the tidy version of Huckleberry Finn, line 9703 contains words "best", "ever", "fun", "life" and "spirit". "best" and "fun" have AFINN scores of 3 and 4 respectively. After aggregating, line 9703 will have a total score of 7.
In the tidyverse, filter() is preferred to subset() because it combines the functionality of subset() with simpler syntax. Here is an example that filter()s data_frame where some value in column1 is equal to 24. Notice the column name is not in quotes.
filter(data_frame, column1 == 24)
The afinn object contains the AFINN lexicon. The huck object is a tidy version of Mark Twain's Adventures of Huckleberry Finn for analysis.
Line 5400 is All the loafers looked glad; I reckoned they was used to having fun out of Boggs. Stopwords and punctuation have already been removed in the dataset.
Este exercício faz parte do curso
Sentiment Analysis in R
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# See abbreviated line 5400
huck %>% filter(line == 5400)
# What are the scores of the sentiment words?
afinn %>% filter(word %in% c("fun", "glad"))
huck_afinn <- huck %>%
# Inner Join to AFINN lexicon
inner_join(___, by = c("___" = "___")) %>%
# Count by value and line
___(___, ___)