Box plot
An easy way to compare multiple distributions is with a box plot. This code will help you construct multiple box plots to make a compact visual.
In this exercise the all_book_polarity object is already loaded. The data frame contains two columns, book and polarity. It comprises all books with qdap's polarity() function applied. Here are the first 3 rows of the large object.
| book | polarity | |
|---|---|---|
| 14 | huck | 0.2773501 |
| 22 | huck | 0.2581989 |
| 26 | huck | -0.5773503 |
This exercise introduces tapply() which allows you to apply functions over a ragged array. You input a vector of values and then a vector of factors. For each factor, value combination the third parameter, a function like min(), is applied. For example here's some code with tapply() used on two vectors.
f1 <- as.factor(c("Group1", "Group2", "Group1", "Group2"))
stat1 <- c(1, 2, 1, 2)
tapply(stat1, f1, sum)
The result is an array where Group1 has a value of 2 (1+1) and Group2 has a value of 4 (2+2).
Este exercício faz parte do curso
Sentiment Analysis in R
Instruções do exercício
- Since it's already loaded, examine the
all_book_polaritywithstr(). - Using
tapply(), pass inall_book_polarity$polarity,all_book_polarity$bookand thesummary()function. This will print the summary statistics for the 4 books in terms of theirpolarity()scores. You would expect to see Oz and Huck Finn to have higher averages than Agamemnon or Moby Dick. Pay close attention to the median. - Create a box plot with
ggplot()by passing inall_book_polarity.- Aesthetics should be
aes(x = book, y = polarity). - Using a
+add thegeom_boxplot()withcol = "darkred". Pay close attention to the dark line in each box representing median. - Next add another layer called
geom_jitter()to add points for each of the words.
- Aesthetics should be
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Examine
___
# Summary by document
___
# Box plot
ggplot(___, aes(x = ___, y = ___)) +
___(fill = c("#bada55", "#F00B42", "#F001ED", "#BA6E15"), col = "___") +
___(position = position_jitter(width = 0.1, height = 0), alpha = 0.02) +
theme_gdocs() +
ggtitle("Book Polarity")