Get startedGet started for free

Create a Tidy Text Tibble!

Since you learned about tidy principles this code helps you organize your data into a tibble so you can then work within the tidyverse!

Previously you learned that applying tidy() on a TermDocumentMatrix() object will convert the TDM to a tibble. In this exercise you will create the word data directly from the review column called comments.

First you use unnest_tokens() to make the text lowercase and tokenize the reviews into single words.

Sometimes it is useful to capture the original word order within each group of a corpus. To do so, use mutate(). In mutate() you will use seq_along() to create a sequence of numbers from 1 to the length of the object. This will capture the word order as it was written.

In the tm package, you would use removeWords() to remove stopwords. In the tidyverse you first need to load the stop words lexicon and then apply an anti_join() between the tidy text data frame and the stopwords.

This exercise is part of the course

Sentiment Analysis in R

View Course

Exercise instructions

  • Create tidy_reviews by piping (%>%) the original reviews object bos_reviews to the unnest_tokens() function. Pass in a new column name, word and declare the comments column. Remember in the tidyverse you don't need a $ or quotes.
  • Create a new variable the tidy way! Rewrite tidy_reviews by piping tidy_reviews to group_by with the column id. Then %>% it again to mutate(). Within mutate create a new variable original_word_order equal to seq_along(word).
  • Print out the tibble, tidy_reviews.
  • Load the premade "SMART" stopwords to your R session with data("stop_words").
  • Overwrite tidy_reviews by passing the original tidy_reviews to anti_join() with a %>%. Within anti_join() pass in the predetermined stop_words lexicon.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Vector to tibble
tidy_reviews <- bos_reviews %>% 
  ___(___, ___)

# Group by and mutate
tidy_reviews <- tidy_reviews %>% 
  ___(___) %>% 
  ___(original_word_order = ___(___))

# Quick review
___

# Load stopwords
___

# Perform anti-join
tidy_reviews_without_stopwords <- tidy_reviews %>% 
  ___(___)
Edit and Run Code