TM refresher (I)

In the Text Mining: Bag of Words course you learned that a corpus is a set of texts, and you studied some functions for preprocessing the text. To recap, one way to create & clean a corpus is with the functions below. Even though this is a different course, sentiment analysis is part of text mining so a refresher can be helpful.

Turn a character vector into a text source using VectorSource().
Turn a text source into a corpus using VCorpus().
Remove unwanted characters from the corpus using cleaning functions like removePunctuation() and stripWhitespace() from tm, and replace_abbreviation() from qdap.

In this exercise a custom clean_corpus() function has been created using standard preprocessing functions for easier application.

clean_corpus() accepts the output of VCorpus() and applies cleaning functions. For example:

processed_corpus <- clean_corpus(my_corpus)

Your R session has a text vector, tm_define, containing two small documents and the function clean_corpus().

Create an object called tm_vector by applying VectorSource() to tm_define.
Make tm_corpus using VCorpus() on tm_vector.
Use content() to examine the contents of the first document in tm_corpus.
- Documents in the corpus are accessed using list syntax, so use double square brackets, e.g. [[1]].
Clean the corpus text using the custom function clean_corpus() on tm_corpus. Call this new object tm_clean.
Examine the first document of the new tm_clean object again to see how the text changed after clean_corpus() was applied.

Fast & Dirty: Polarity scoring

Sentiment Analysis the tidytext Way

Visualizing Sentiment

Case study: Airbnb reviews

Ubung

TM refresher (I)

Anweisungen