Creating a corpus

You have created a tibble called russian_tweets that contains around 20,000 tweets auto generated by bots during the 2016 U.S. election cycle so that you can perform text analysis. However, when searching through the available options for performing the analysis you have chosen to do, you believe that the tm package offers the easiest path forward. In order to conduct the analysis, you first must create a corpus and attach potentially useful metadata.

Be aware that this is real data from Twitter and as such there is always a risk that it may contain profanity or other offensive content (in this exercise, and any following exercises that also use real Twitter data).

Cet exercice fait partie du cours

Introduction to Natural Language Processing in R

Afficher le cours

Instructions

Create a corpus using the content column of russian_tweets.
Attach both the following and followers columns as metadata to tweet_corpus.
Print the first few rows of the metadata table.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Create a corpus
tweet_corpus <- ___(___(russian_tweets$___))

# Attach following and followers
___(tweet_corpus, 'following') <- russian_tweets$___
___(tweet_corpus, 'followers') <- russian_tweets$___

# Review the meta data
head(meta(___))

Modifier et exécuter le code

Introduction to Natural Language Processing in R

IntermédiaireNiveau de compétence

4.8+

33 reviews

In chapter 4 we cover two staples of natural language processing, sentiment analysis, and word embeddings. These are two analysis techniques that are a must for anyone learning the fundamentals of text analysis. Furthermore, you will briefly learn about BERT, part-of-speech tagging, and named entity recognition. Almost 15 different analysis techniques were covered in this course, so chapter 4 ends by recapping all of the great techniques you will learn about in this course.

Exercise 1: Sentiment analysis Exercise 2: tidytext lexicons Exercise 3: Sentiment scores Exercise 4: Sentiment and emotion Exercise 5: Word embeddings Exercise 6: h2o practice Exercise 7: word2vec Exercise 8: Additional NLP analysis Exercise 9: Reviewing methods #1 Exercise 10: Review methods #2 Exercise 11: Conclusion