1. Learn
  2. /
  3. Courses
  4. /
  5. Sentiment Analysis in R

Exercise

TM refresher (I)

In the Text Mining: Bag of Words course you learned that a corpus is a set of texts, and you studied some functions for preprocessing the text. To recap, one way to create & clean a corpus is with the functions below. Even though this is a different course, sentiment analysis is part of text mining so a refresher can be helpful.

  • Turn a character vector into a text source using VectorSource().
  • Turn a text source into a corpus using VCorpus().
  • Remove unwanted characters from the corpus using cleaning functions like removePunctuation() and stripWhitespace() from tm, and replace_abbreviation() from qdap.

In this exercise a custom clean_corpus() function has been created using standard preprocessing functions for easier application.

clean_corpus() accepts the output of VCorpus() and applies cleaning functions. For example:

processed_corpus <- clean_corpus(my_corpus)

Instructions

100 XP

Your R session has a text vector, tm_define, containing two small documents and the function clean_corpus().

  • Create an object called tm_vector by applying VectorSource() to tm_define.
  • Make tm_corpus using VCorpus() on tm_vector.
  • Use content() to examine the contents of the first document in tm_corpus.
    • Documents in the corpus are accessed using list syntax, so use double square brackets, e.g. [[1]].
  • Clean the corpus text using the custom function clean_corpus() on tm_corpus. Call this new object tm_clean.
  • Examine the first document of the new tm_clean object again to see how the text changed after clean_corpus() was applied.