Create Polarity Based Corpora

In this exercise you will perform Step 3 of the text mining workflow. Although qdap isn't a tidy package you will mutate() a new column based on the returned polarity list representing all polarity (that's a hint BTW) scores. In chapter 3 we used a custom function pol_subsections which uses only base R declarations. However, in following the tidy principles this exercise uses filter() then introduces pull(). The pull() function works like works like [[ to extract a single variable.

Once segregated you collapse all the positive and negative comments into two larger documents representing all words among the positive and negative rental reviews.

Lastly, you will create a Term Frequency Inverse Document Frequency (TFIDF) weighted Term Document Matrix (TDM). Since this exercise code starts with a tidy structure, some of the functions borrowed from tm are used along with the %>% operator to keep the style consistent. If the basics of the tm package aren't familiar check out the Text Mining with Bag-of-Words in R course. Instead of counting the number of times a word is used (frequency), the values in the TDM are penalized for over used terms, which helps reduce non-informative words.

Get the positive comments.
- Mutate to add a polarity column, equal to bos_pol$all$polarity.
- Filter to keep rows where polarity is greater than zero.
- Use pull() to extract the comments column. (Pass this column without quotes.)
- Collapse into a single string, separated by spaces using paste(), passing collapse = " ".

Fast & Dirty: Polarity scoring

Sentiment Analysis the tidytext Way

Visualizing Sentiment

Case study: Airbnb reviews

Exercise

Create Polarity Based Corpora

Instructions 1/4