Make a term-document matrix
You're almost done with the not-so-exciting foundational work before we get to some fun visualizations and analyses based on the concepts you've learned so far!
In this exercise, you are performing a similar process but taking the transpose of the document-term matrix. In this case, the term-document matrix has terms in the first column and documents across the top as individual column names.
The TDM is often the matrix used for language analysis. This is because you likely have more terms than authors or documents and life is generally easier when you have more rows than columns. An easy way to start analyzing the information is to change the matrix into a simple matrix using as.matrix()
on the TDM.
This exercise is part of the course
Text Mining with Bag-of-Words in R
Exercise instructions
- Create
coffee_tdm
by applyingTermDocumentMatrix()
toclean_corp
. - Create
coffee_m
by convertingcoffee_tdm
to a matrix usingas.matrix()
. - Print the dimensions of
coffee_m
to the console. Note the number of rows and columns. - Print the subset of
coffee_m
containing terms (rows)"star"
and"starbucks"
and documents (columns) 25 through 35.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a term-document matrix from the corpus
coffee_tdm <- ___
# Print coffee_tdm data
coffee_tdm
# Convert coffee_tdm to a matrix
coffee_m <- ___
# Print the dimensions of the matrix
___
# Review a portion of the matrix
___[___, ___]