Visualize dissimilar words
Say you want to visualize the words not in common. To do this, you can also use comparison.cloud()
, and the steps are quite similar with one main difference.
Like when you were searching for words in common, you start by unifying the tweets into distinct corpora and combining them into their own VCorpus()
object. Next apply a clean_corpus()
function and organize it into a TermDocumentMatrix
.
To keep track of what words belong to coffee
versus chardonnay
, you can set the column names of the TDM like this:
colnames(all_tdm) <- c("chardonnay", "coffee")
Lastly, convert the object to a matrix using as.matrix()
for use in comparison.cloud()
. For every distinct corpora passed to the comparison.cloud()
you can specify a color, as in colors = c("red", "yellow", "green")
, to make the sections distinguishable.
This exercise is part of the course
Text Mining with Bag-of-Words in R
Exercise instructions
all_corpus
is preloaded in your workspace.
- Create
all_clean
by applying the predefinedclean_corpus
function toall_corpus
. - Create
all_tdm
, aTermDocumentMatrix
, fromall_clean
. - Use
colnames()
to rename each distinct corpora withinall_tdm
. Name the first column "coffee" and the second column "chardonnay". - Create
all_m
by convertingall_tdm
into matrix form. - Create a
comparison.cloud()
usingall_m
, withcolors = c("orange", "blue")
andmax.words = 50
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Clean the corpus
___ <- ___(___)
# Create all_tdm
___ <- ___(___)
# Give the columns distinct names
___(___) <- ___
# Create all_m
___ <- ___(___)
# Create comparison cloud
comparison.cloud(___, ___ = c("___", "___"), max.words = ___)