Get startedGet started for free

Make the vector a VCorpus object (2)

Now that we've converted our vector to a Source object, we pass it to another tm function, VCorpus(), to create our volatile corpus. Pretty straightforward, right?

The VCorpus object is a nested list or list of lists. At each index of the VCorpus object, there is a PlainTextDocument object, which is a list containing actual text data (content), and some corresponding metadata (meta). It can help to visualize a VCorpus object to conceptualize the whole thing.

To review a single document object (the 10th), you subset with double square brackets.

coffee_corpus[[10]]

To review the actual text, you index the list twice. To access the document's metadata, like timestamp, change [1] to [2]. Another way to review the plain text is with the content() function, which doesn't need the second set of brackets.

coffee_corpus[[10]][1]

content(coffee_corpus[[10]])

This exercise is part of the course

Text Mining with Bag-of-Words in R

View Course

Exercise instructions

  • Call the VCorpus() function on the coffee_source object to create coffee_corpus.
  • Verify coffee_corpus is a VCorpus object by printing it to the console.
  • Print the 15th element of coffee_corpus to the console to verify that it's a PlainTextDocument that contains the content and metadata of the 15th tweet. Use double bracket subsetting.
  • Print the content of the 15th tweet in coffee_corpus. Use double brackets to select the proper tweet, followed by single brackets to extract the content of that tweet.
  • Print the content() of the 10th tweet within coffee_corpus

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

## coffee_source is already in your workspace

# Make a volatile corpus from coffee_source
coffee_corpus <- ___

# Print out coffee_corpus
___

# Print the 15th tweet in coffee_corpus
___

# Print the contents of the 15th tweet in coffee_corpus
___

# Now use content to review the plain text of the 10th tweet
___(___[[___]])
Edit and Run Code