Session Ready
Exercise

From dtm to topic model

You are given data frame corpus. Each row corresponds to one occurrence of a named entity. Column doc_id contains the entity, text - the context words with suffixes. You will build a document-term matrix and will fit a topic model.

Instructions 1/4
undefined XP
  • 1
  • 2
  • 3
  • 4
  • We need to combine text from multiple occurrences of the same entity into one document. Using dplyr, for each entity (doc_id) generate a summary variable doc that will contain combined text strings. Save the result into table corpus2.