Creating a corpus

You have created a tibble called russian_tweets that contains around 20,000 tweets auto generated by bots during the 2016 U.S. election cycle so that you can perform text analysis. However, when searching through the available options for performing the analysis you have chosen to do, you believe that the tm package offers the easiest path forward. In order to conduct the analysis, you first must create a corpus and attach potentially useful metadata.

Be aware that this is real data from Twitter and as such there is always a risk that it may contain profanity or other offensive content (in this exercise, and any following exercises that also use real Twitter data).

This exercise is part of the course

Introduction to Natural Language Processing in R

View Course

Exercise instructions

  • Create a corpus using the content column of russian_tweets.
  • Attach both the following and followers columns as metadata to tweet_corpus.
  • Print the first few rows of the metadata table.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create a corpus
tweet_corpus <- ___(___(russian_tweets$___))

# Attach following and followers
___(tweet_corpus, 'following') <- russian_tweets$___
___(tweet_corpus, 'followers') <- russian_tweets$___

# Review the meta data
head(meta(___))