Exercise

Box plot

An easy way to compare multiple distributions is with a box plot. This code will help you construct multiple box plots to make a compact visual.

In this exercise the all_book_polarity object is already loaded. The data frame contains two columns, book and polarity. It comprises all books with qdap's polarity() function applied. Here are the first 3 rows of the large object.

book polarity
14 huck 0.2773501
22 huck 0.2581989
26 huck -0.5773503

This exercise introduces tapply() which allows you to apply functions over a ragged array. You input a vector of values and then a vector of factors. For each factor, value combination the third parameter, a function like min(), is applied. For example here's some code with tapply() used on two vectors.

f1 <- as.factor(c("Group1", "Group2", "Group1", "Group2"))
stat1 <- c(1, 2, 1, 2)
tapply(stat1, f1, sum)

The result is an array where Group1 has a value of 2 (1+1) and Group2 has a value of 4 (2+2).

Instructions

100 XP
  • Since it's already loaded, examine the all_book_polarity with str().
  • Using tapply(), pass in all_book_polarity$polarity, all_book_polarity$book and the summary() function. This will print the summary statistics for the 4 books in terms of their polarity() scores. You would expect to see Oz and Huck Finn to have higher averages than Agamemnon or Moby Dick. Pay close attention to the median.
  • Create a box plot with ggplot() by passing in all_book_polarity.
    • Aesthetics should be aes(x = book, y = polarity).
    • Using a + add the geom_boxplot() with col = "darkred". Pay close attention to the dark line in each box representing median.
    • Next add another layer called geom_jitter() to add points for each of the words.