Get startedGet started for free

TM refresher (I)

In the Text Mining: Bag of Words course you learned that a corpus is a set of texts, and you studied some functions for preprocessing the text. To recap, one way to create & clean a corpus is with the functions below. Even though this is a different course, sentiment analysis is part of text mining so a refresher can be helpful.

In this exercise a custom clean_corpus() function has been created using standard preprocessing functions for easier application.

clean_corpus() accepts the output of VCorpus() and applies cleaning functions. For example:

processed_corpus <- clean_corpus(my_corpus)

This exercise is part of the course

Sentiment Analysis in R

View Course

Exercise instructions

Your R session has a text vector, tm_define, containing two small documents and the function clean_corpus().

  • Create an object called tm_vector by applying VectorSource() to tm_define.
  • Make tm_corpus using VCorpus() on tm_vector.
  • Use content() to examine the contents of the first document in tm_corpus.
    • Documents in the corpus are accessed using list syntax, so use double square brackets, e.g. [[1]].
  • Clean the corpus text using the custom function clean_corpus() on tm_corpus. Call this new object tm_clean.
  • Examine the first document of the new tm_clean object again to see how the text changed after clean_corpus() was applied.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# clean_corpus(), tm_define are pre-defined
clean_corpus
tm_define

# Create a VectorSource
tm_vector <- ___

# Apply VCorpus
tm_corpus <- ___

# Examine the first document's contents
___(___[[___]])

# Clean the text
tm_clean <- ___

# Reexamine the contents of the first doc
___
Edit and Run Code