Scaled Comparison Cloud
Recall the "grade inflation" of polarity scores on the rental reviews? Sometimes, another way to uncover an insight is to scale the scores back to 0 then perform the corpus subset. This means some of the previously positive comments may become part of the negative subsection or vice versa since the mean is changed to 0. This exercise will help you scale the scores and then re-plot the comparison.cloud()
. Removing the "grade inflation" can help provide additional insights.
Previously you applied polarity()
to the bos_reviews$comments
and created a comparison.cloud()
. In this exercise you will scale()
the outcome before creating the comparison.cloud()
. See if this shows something different in the visual!
Since this is largely a review exercise, a lot of the code exists, just fill in the correct objects and parameters!
This exercise is part of the course
Sentiment Analysis in R
Exercise instructions
- Review a section of the pre-loaded
bos_pol$all
while indexing[1:6,1:3]
. - Add a new column to called
scaled_polarity
withscale()
applied to the polarity score columnbos_pol$all$polarity
. - For positive comments,
subset()
where the new columnbos_reviews$scaled_polarity
is greater than (>) zero. - For negative comments,
subset()
where the new columnbos_reviews$scaled_polarity
is less than (<) zero. - Create
pos_terms
usingpaste()
onpos_comments
. - Now create
neg_terms
withpaste()
onneg_comments
. - Organize the collapsed documents,
pos_terms
andneg_terms
documents into a single corpus calledall_terms
. - Following the usual
tm
workflow by nestingVectorSource()
insideVCorpus()
applied toall_terms
. - Make the
TermDocumentMatrix()
using theall_corpus
object. Note this is a TfIdf weighted TDM with basic cleaning functions. - Change
all_tdm
toall_tdm_m
usingas.matrix()
. Then rename the columns in the existing code to"positive"
and"negative"
. - Finally! apply
comparison.cloud()
to the matrix objectall_tdm_m
. Take notice of the new most frequent negative words. Maybe it will uncover an unknown insight!
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Review
___
# Scale/center & append
bos_reviews$___ <- scale(___)
# Subset positive comments
pos_comments <- subset(bos_reviews$comments, ___)
# Subset negative comments
neg_comments <- subset(bos_reviews$comments, ___)
# Paste and collapse the positive comments
pos_terms <- paste(___, collapse = " ")
# Paste and collapse the negative comments
neg_terms <- paste(___, collapse = " ")
# Organize
all_terms<- c(___, ___)
# VCorpus
all_corpus <- ___(VectorSource(___))
# TDM
all_tdm <- TermDocumentMatrix(
___,
control = list(
weighting = weightTfIdf,
removePunctuation = TRUE,
stopwords = stopwords(kind = "en")
)
)
# Column names
___ <- as.matrix(___)
colnames(all_tdm_m) <- c("___", "___")
# Comparison cloud
comparison.cloud(
___,
max.words = 100,
colors = c("darkgreen", "darkred")
)