Create Polarity Based Corpora
In this exercise you will perform Step 3 of the text mining workflow. Although qdap
isn't a tidy package you will mutate()
a new column based on the returned polarity
list representing all polarity (that's a hint BTW) scores. In chapter 3 we used a custom function pol_subsections
which uses only base R declarations. However, in following the tidy principles this exercise uses filter()
then introduces pull()
. The pull()
function works like works like [[
to extract a single variable.
Once segregated you collapse all the positive and negative comments into two larger documents representing all words among the positive and negative rental reviews.
Lastly, you will create a Term Frequency Inverse Document Frequency (TFIDF) weighted Term Document Matrix (TDM). Since this exercise code starts with a tidy structure, some of the functions borrowed from tm
are used along with the %>%
operator to keep the style consistent. If the basics of the tm
package aren't familiar check out the Text Mining with Bag-of-Words in R course. Instead of counting the number of times a word is used (frequency), the values in the TDM are penalized for over used terms, which helps reduce non-informative words.
This exercise is part of the course
Sentiment Analysis in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
pos_terms <- bos_reviews %>%
# Add polarity column
___(polarity = ___) %>%
# Filter for positive polarity
___(___) %>%
# Extract comments column
___(___) %>%
# Paste and collapse
___(collapse = "___")