Get Started

The TDM & DTM

1. The TDM & DTM

With your cleaned corpus, you need to change the data structure for analysis. The foundation of bag of words text mining is either the term document matrix or document term matrix.

2. TDM vs. DTM

The term document matrix has each corpus word represented as a row with documents as columns. In this example you simply use the TermDocumentMatrix function on a corpus to create a TDM. The document term matrix is the transposition of the TDM so each document is a row and each word is a column. Once again the aptly named DocumentTermMatrix function creates a matrix with documents as rows shown here. In its simplest form, the matrices contain word frequencies. However, other frequency measures do exist.

3. Word Frequency Matrix (WFM)

The qdap package relies on a word frequency matrix. This course doesn't focus on the word frequency matrix, since it is less popular and can be made from a term document matrix.

4. Let's practice!