Create Polarity Based Corpora
In this exercise you will perform Step 3 of the text mining workflow. Although qdap
isn't a tidy package you will mutate()
a new column based on the returned polarity
list representing all polarity (that's a hint BTW) scores. In chapter 3 we used a custom function pol_subsections
which uses only base R declarations. However, in following the tidy principles this exercise uses filter()
then introduces pull()
. The pull()
function works like works like [[
to extract a single variable.
Once segregated you collapse all the positive and negative comments into two larger documents representing all words among the positive and negative rental reviews.
Lastly, you will create a Term Frequency Inverse Document Frequency (TFIDF) weighted Term Document Matrix (TDM). Since this exercise code starts with a tidy structure, some of the functions borrowed from tm
are used along with the %>%
operator to keep the style consistent. If the basics of the tm
package aren't familiar check out the Text Mining with Bag-of-Words in R course. Instead of counting the number of times a word is used (frequency), the values in the TDM are penalized for over used terms, which helps reduce non-informative words.
Diese Übung ist Teil des Kurses
Sentiment Analysis in R
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
pos_terms <- bos_reviews %>%
# Add polarity column
___(polarity = ___) %>%
# Filter for positive polarity
___(___) %>%
# Extract comments column
___(___) %>%
# Paste and collapse
___(collapse = "___")