Session Ready
Exercise

Align corpus

You have LDA model object train_mod and table corpus2 with initial data. You will need to align the corpus of the test records and make a document-term matrix for testing.

Instructions
100 XP
  • Rerun sample.int with set.seed to reproduce the row indices for testing rows.
  • Extract vocabulary of the training model using tidy
  • Create a table of counts, making sure that you keep only the rows with words that were present in the training data
  • Generate a document-term matrix with the testing data