Creating a corpus
You have created a tibble called russian_tweets
that contains around 20,000 tweets auto generated by bots during the 2016 U.S. election cycle so that you can perform text analysis. However, when searching through the available options for performing the analysis you have chosen to do, you believe that the tm
package offers the easiest path forward. In order to conduct the analysis, you first must create a corpus and attach potentially useful metadata.
Be aware that this is real data from Twitter and as such there is always a risk that it may contain profanity or other offensive content (in this exercise, and any following exercises that also use real Twitter data).
This exercise is part of the course
Introduction to Natural Language Processing in R
Exercise instructions
- Create a corpus using the
content
column ofrussian_tweets
. - Attach both the
following
andfollowers
columns as metadata totweet_corpus
. - Print the first few rows of the metadata table.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a corpus
tweet_corpus <- ___(___(russian_tweets$___))
# Attach following and followers
___(tweet_corpus, 'following') <- russian_tweets$___
___(tweet_corpus, 'followers') <- russian_tweets$___
# Review the meta data
head(meta(___))