Get startedGet started for free

Make the vector a VCorpus object (1)

Recall that you've loaded your text data as a vector called coffee_tweets in the last exercise. Your next step is to convert this vector containing the text data to a corpus. As you've learned in the video, a corpus is a collection of documents, but it's also important to know that in the tm domain, R recognizes it as a data type.

There are two kinds of the corpus data type, the permanent corpus, PCorpus, and the volatile corpus, VCorpus. In essence, the difference between the two has to do with how the collection of documents is stored on your computer. In this course, we will use the volatile corpus, which is held in your computer's RAM rather than saved to disk, just to be more memory efficient.

To make a volatile corpus, R needs to interpret each element in our vector of text, coffee_tweets, as a document. And the tm package provides what are called Source functions to do just that! In this exercise, we'll use a Source function called VectorSource() because our text data is contained in a vector. The output of this function is called a Source object. Give it a shot!

This exercise is part of the course

Text Mining with Bag-of-Words in R

View Course

Exercise instructions

  • Load the tm package.
  • Create a Source object from the coffee_tweets vector. Call this new object coffee_source.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Load tm
___

# Make a vector source from coffee_tweets
___
Edit and Run Code