ComeçarComece de graça

Make a term-document matrix

You're almost done with the not-so-exciting foundational work before we get to some fun visualizations and analyses based on the concepts you've learned so far!

In this exercise, you are performing a similar process but taking the transpose of the document-term matrix. In this case, the term-document matrix has terms in the first column and documents across the top as individual column names.

The TDM is often the matrix used for language analysis. This is because you likely have more terms than authors or documents and life is generally easier when you have more rows than columns. An easy way to start analyzing the information is to change the matrix into a simple matrix using as.matrix() on the TDM.

Este exercício faz parte do curso

Text Mining with Bag-of-Words in R

Ver curso

Instruções do exercício

  • Create coffee_tdm by applying TermDocumentMatrix() to clean_corp.
  • Create coffee_m by converting coffee_tdm to a matrix using as.matrix().
  • Print the dimensions of coffee_m to the console. Note the number of rows and columns.
  • Print the subset of coffee_m containing terms (rows) "star" and "starbucks" and documents (columns) 25 through 35.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Create a term-document matrix from the corpus
coffee_tdm <- ___

# Print coffee_tdm data
coffee_tdm

# Convert coffee_tdm to a matrix
coffee_m <- ___

# Print the dimensions of the matrix
___

# Review a portion of the matrix
___[___, ___]
Editar e executar o código