1. Learn
  2. /
  3. Courses
  4. /
  5. Text Mining with Bag-of-Words in R

Exercise

Make a term-document matrix

You're almost done with the not-so-exciting foundational work before we get to some fun visualizations and analyses based on the concepts you've learned so far!

In this exercise, you are performing a similar process but taking the transpose of the document-term matrix. In this case, the term-document matrix has terms in the first column and documents across the top as individual column names.

The TDM is often the matrix used for language analysis. This is because you likely have more terms than authors or documents and life is generally easier when you have more rows than columns. An easy way to start analyzing the information is to change the matrix into a simple matrix using as.matrix() on the TDM.

Instructions

100 XP
  • Create coffee_tdm by applying TermDocumentMatrix() to clean_corp.
  • Create coffee_m by converting coffee_tdm to a matrix using as.matrix().
  • Print the dimensions of coffee_m to the console. Note the number of rows and columns.
  • Print the subset of coffee_m containing terms (rows) "star" and "starbucks" and documents (columns) 25 through 35.