Make the vector a VCorpus object (2)
Now that we've converted our vector to a Source object, we pass it to another tm function, VCorpus(), to create our volatile corpus. Pretty straightforward, right?
The VCorpus object is a nested list or list of lists. At each index of the VCorpus object, there is a PlainTextDocument object, which is a list containing actual text data (content), and some corresponding metadata (meta). It can help to visualize a VCorpus object to conceptualize the whole thing.
To review a single document object (the 10th), you subset with double square brackets.
coffee_corpus[[10]]
To review the actual text, you index the list twice. To access the document's metadata, like timestamp, change [1] to [2]. Another way to review the plain text is with the content() function, which doesn't need the second set of brackets.
coffee_corpus[[10]][1]
content(coffee_corpus[[10]])
This exercise is part of the course
Text Mining with Bag-of-Words in R
Exercise instructions
- Call the
VCorpus()function on thecoffee_sourceobject to createcoffee_corpus. - Verify
coffee_corpusis aVCorpusobject by printing it to the console. - Print the 15th element of
coffee_corpusto the console to verify that it's aPlainTextDocumentthat contains the content and metadata of the 15th tweet. Use double bracket subsetting. - Print the content of the 15th tweet in
coffee_corpus. Use double brackets to select the proper tweet, followed by single brackets to extract the content of that tweet. - Print the
content()of the 10th tweet withincoffee_corpus
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
## coffee_source is already in your workspace
# Make a volatile corpus from coffee_source
coffee_corpus <- ___
# Print out coffee_corpus
___
# Print the 15th tweet in coffee_corpus
___
# Print the contents of the 15th tweet in coffee_corpus
___
# Now use content to review the plain text of the 10th tweet
___(___[[___]])