Box plot
An easy way to compare multiple distributions is with a box plot. This code will help you construct multiple box plots to make a compact visual.
In this exercise the all_book_polarity
object is already loaded. The data frame contains two columns, book
and polarity
. It comprises all books with qdap
's polarity()
function applied. Here are the first 3 rows of the large object.
book | polarity | |
---|---|---|
14 | huck | 0.2773501 |
22 | huck | 0.2581989 |
26 | huck | -0.5773503 |
This exercise introduces tapply()
which allows you to apply functions over a ragged array. You input a vector of values and then a vector of factors. For each factor, value combination the third parameter, a function like min()
, is applied. For example here's some code with tapply()
used on two vectors.
f1 <- as.factor(c("Group1", "Group2", "Group1", "Group2"))
stat1 <- c(1, 2, 1, 2)
tapply(stat1, f1, sum)
The result is an array where Group1
has a value of 2 (1+1) and Group2
has a value of 4 (2+2).
Cet exercice fait partie du cours
Sentiment Analysis in R
Instructions
- Since it's already loaded, examine the
all_book_polarity
withstr()
. - Using
tapply()
, pass inall_book_polarity$polarity
,all_book_polarity$book
and thesummary()
function. This will print the summary statistics for the 4 books in terms of theirpolarity()
scores. You would expect to see Oz and Huck Finn to have higher averages than Agamemnon or Moby Dick. Pay close attention to the median. - Create a box plot with
ggplot()
by passing inall_book_polarity
.- Aesthetics should be
aes(x = book, y = polarity)
. - Using a
+
add thegeom_boxplot()
withcol = "darkred"
. Pay close attention to the dark line in each box representing median. - Next add another layer called
geom_jitter()
to add points for each of the words.
- Aesthetics should be
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Examine
___
# Summary by document
___
# Box plot
ggplot(___, aes(x = ___, y = ___)) +
___(fill = c("#bada55", "#F00B42", "#F001ED", "#BA6E15"), col = "___") +
___(position = position_jitter(width = 0.1, height = 0), alpha = 0.02) +
theme_gdocs() +
ggtitle("Book Polarity")