ComenzarEmpieza gratis

Make a document-term matrix

Hopefully, you are not too tired after all this basic text mining work! Just in case, let's revisit the coffee and get some Starbucks while building a document-term matrix from coffee tweets.

Beginning with the coffee.csv file, we have used common transformations to produce a clean corpus called clean_corp.

The document-term matrix is used when you want to have each document represented as a row. This can be useful if you are comparing authors within rows, or the data is arranged chronologically, and you want to preserve the time series. The tm package uses a "simple triplet matrix" class. However, it is often easier to manipulate and examine the object by re-classifying the DTM with as.matrix()

Este ejercicio forma parte del curso

Text Mining with Bag-of-Words in R

Ver curso

Instrucciones del ejercicio

  • Create coffee_dtm by applying DocumentTermMatrix() to clean_corp.
  • Create coffee_m, a matrix version of coffee_dtm, using as.matrix().
  • Print the dimensions of coffee_m to the console using the dim() function. Note the number of rows and columns.
  • Print the subset of coffee_m containing documents (rows) 25 through 35 and terms (columns) "star" and "starbucks".

Ejercicio interactivo práctico

Prueba este ejercicio completando el código de muestra.

# Create the document-term matrix from the corpus
coffee_dtm <- ___

# Print out coffee_dtm data
coffee_dtm

# Convert coffee_dtm to a matrix
coffee_m <- ___

# Print the dimensions of coffee_m
___

# Review a portion of the matrix to get some Starbucks
___[___:___, c("star", "___")]
Editar y ejecutar código