Make the vector a VCorpus object (1)
Recall that you've loaded your text data as a vector called coffee_tweets
in the last exercise. Your next step is to convert this vector containing the text data to a corpus. As you've learned in the video, a corpus is a collection of documents, but it's also important to know that in the tm
domain, R recognizes it as a data type.
There are two kinds of the corpus data type, the permanent corpus, PCorpus
, and the volatile corpus, VCorpus
. In essence, the difference between the two has to do with how the collection of documents is stored on your computer. In this course, we will use the volatile corpus, which is held in your computer's RAM rather than saved to disk, just to be more memory efficient.
To make a volatile corpus, R needs to interpret each element in our vector of text, coffee_tweets
, as a document. And the tm
package provides what are called Source functions to do just that! In this exercise, we'll use a Source function called VectorSource()
because our text data is contained in a vector. The output of this function is called a Source object. Give it a shot!
This exercise is part of the course
Text Mining with Bag-of-Words in R
Exercise instructions
- Load the
tm
package. - Create a Source object from the
coffee_tweets
vector. Call this new objectcoffee_source
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load tm
___
# Make a vector source from coffee_tweets
___