DTM vs. tidytext matrix
The tidyverse is a collection of R packages that share common philosophies and are designed to work together. This chapter covers some tidy functions to manipulate data. In this exercise you will compare a DTM to a tidy text data frame called a tibble.
Within the tidyverse, each observation is a single row in a data frame. That makes working in different packages much easier since the fundamental data structure is the same. Parts of this course borrow heavily from the tidytext package which uses this data organization.
For example, you may already be familiar with the %>% operator from the magrittr package. This forwards an object on its left-hand side as the first argument of the function on its right-hand side.
In the example below, you are forwarding the data object to function1(). Notice how the parentheses are empty. This in turn is forwarded to function2(). In the last function you don't have to add the data object because it was forwarded from the output of function1(). However, you do add a fictitious parameter, some_parameter as TRUE. These pipe forwards ultimately create the object.
object <- data %>%
function1() %>%
function2(some_parameter = TRUE)
To use the %>% operator, you don't necessarily need to load the magrittr package, since it is also available in the dplyr package.
dplyr also contains the functions inner_join() (which you'll learn more about later) and count() for tallying data. The last function you'll need is mutate() to create new variables or modify existing ones.
object <- data %>%
mutate(new_Var_name = Var1 - Var2)
or to modify a variable
object <- data %>%
mutate(Var1 = as.factor(Var1))
You will also use tidyr's pivot_wider() function to organize the data with each row being a line from the book and the positive and negative values as columns.
| index | negative | positive |
|---|---|---|
| 42 | 2 | 0 |
| 43 | 0 | 1 |
| 44 | 1 | 0 |
To change a DTM to a tidy format use tidy() from the broom package.
tidy_format <- tidy(Document_Term_Matrix)
This exercise uses text from the Greek tragedy, Agamemnon. Agamemnon is a story about marital infidelity and murder. You can download a copy here.
Este exercício faz parte do curso
Sentiment Analysis in R
Instruções do exercício
We've already created a clean DTM called ag_dtm for this exercise.
- Create
ag_dtm_mby applyingas.matrix()toag_dtm. - Using brackets,
[and], indexag_dtm_mto row2206. - Apply
tidy()toag_dtm. Call the new objectag_tidy. - Examine
ag_tidyat rows[831:835, ]to compare the tidy format. You will see a common word from the examined part ofag_dtm_min step 2.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# As matrix
ag_dtm_m <- ___
# Examine line 2206 and columns 245:250
ag_dtm_m[___, 245:250]
# Tidy up the DTM
ag_tidy <- ___
# Examine tidy with a word you saw
ag_tidy[___, ]