Get startedGet started for free

Make a term-document matrix

You're almost done with the not-so-exciting foundational work before we get to some fun visualizations and analyses based on the concepts you've learned so far!

In this exercise, you are performing a similar process but taking the transpose of the document-term matrix. In this case, the term-document matrix has terms in the first column and documents across the top as individual column names.

The TDM is often the matrix used for language analysis. This is because you likely have more terms than authors or documents and life is generally easier when you have more rows than columns. An easy way to start analyzing the information is to change the matrix into a simple matrix using as.matrix() on the TDM.

This exercise is part of the course

Text Mining with Bag-of-Words in R

View Course

Exercise instructions

  • Create coffee_tdm by applying TermDocumentMatrix() to clean_corp.
  • Create coffee_m by converting coffee_tdm to a matrix using as.matrix().
  • Print the dimensions of coffee_m to the console. Note the number of rows and columns.
  • Print the subset of coffee_m containing terms (rows) "star" and "starbucks" and documents (columns) 25 through 35.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create a term-document matrix from the corpus
coffee_tdm <- ___

# Print coffee_tdm data
coffee_tdm

# Convert coffee_tdm to a matrix
coffee_m <- ___

# Print the dimensions of the matrix
___

# Review a portion of the matrix
___[___, ___]
Edit and Run Code