Get startedGet started for free

Create Polarity Based Corpora

In this exercise you will perform Step 3 of the text mining workflow. Although qdap isn't a tidy package you will mutate() a new column based on the returned polarity list representing all polarity (that's a hint BTW) scores. In chapter 3 we used a custom function pol_subsections which uses only base R declarations. However, in following the tidy principles this exercise uses filter() then introduces pull(). The pull() function works like works like [[ to extract a single variable.

Once segregated you collapse all the positive and negative comments into two larger documents representing all words among the positive and negative rental reviews.

Lastly, you will create a Term Frequency Inverse Document Frequency (TFIDF) weighted Term Document Matrix (TDM). Since this exercise code starts with a tidy structure, some of the functions borrowed from tm are used along with the %>% operator to keep the style consistent. If the basics of the tm package aren't familiar check out the Text Mining with Bag-of-Words in R course. Instead of counting the number of times a word is used (frequency), the values in the TDM are penalized for over used terms, which helps reduce non-informative words.

This exercise is part of the course

Sentiment Analysis in R

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

pos_terms <- bos_reviews %>%
  # Add polarity column
  ___(polarity = ___) %>%
  # Filter for positive polarity
  ___(___) %>%
  # Extract comments column
  ___(___) %>% 
  # Paste and collapse
  ___(collapse = "___")
Edit and Run Code