Get startedGet started for free

Visualize dissimilar words

Say you want to visualize the words not in common. To do this, you can also use comparison.cloud(), and the steps are quite similar with one main difference.

Like when you were searching for words in common, you start by unifying the tweets into distinct corpora and combining them into their own VCorpus() object. Next apply a clean_corpus() function and organize it into a TermDocumentMatrix.

To keep track of what words belong to coffee versus chardonnay, you can set the column names of the TDM like this:

colnames(all_tdm) <- c("chardonnay", "coffee")

Lastly, convert the object to a matrix using as.matrix() for use in comparison.cloud(). For every distinct corpora passed to the comparison.cloud() you can specify a color, as in colors = c("red", "yellow", "green"), to make the sections distinguishable.

This exercise is part of the course

Text Mining with Bag-of-Words in R

View Course

Exercise instructions

all_corpus is preloaded in your workspace.

  • Create all_clean by applying the predefined clean_corpus function to all_corpus.
  • Create all_tdm, a TermDocumentMatrix, from all_clean.
  • Use colnames() to rename each distinct corpora within all_tdm. Name the first column "coffee" and the second column "chardonnay".
  • Create all_m by converting all_tdm into matrix form.
  • Create a comparison.cloud() using all_m, with colors = c("orange", "blue") and max.words = 50.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Clean the corpus
___ <- ___(___)

# Create all_tdm
___ <- ___(___)

# Give the columns distinct names
___(___) <- ___

# Create all_m
___ <- ___(___)

# Create comparison cloud
comparison.cloud(___, ___ = c("___", "___"), max.words = ___)
Edit and Run Code