Other word clouds and word networks

1. Other word clouds and word networks

The wordcloud library provides multiple functions for creating word clouds from one or more corpora.

2. Commonality clouds

Think of your corpora in Venn diagrams to understand the difference. If you want to make a word cloud from a single corpus you use the wordcloud function with rowSums like before. Now we add another corpus to our Venn diagram.

3. Commonality clouds

To find the intersection in a commonality-dot-cloud you first use paste and collapse on each document group. In this example, doing so will change the coffee and chardonnay tweets from 2000 unique tweet documents to two documents. Next, you concatenate them with the c function before cleaning and organizing into a clean corpus of only two documents. Since you collapsed the documents into two columns in the TDM you can simply change it to a matrix and pass to the commonality-cloud function without first using rowSums. The commonality-cloud function will subset the terms to only the words that are shared between corpora.

4. Comparison clouds

The wordcloud library has a function called comparison-cloud. Reviewing the Venn diagram, you can use comparison-cloud to understand the disjunction of the two.

5. Comparison clouds

As before, you paste then collapse the documents. Then you concatenate using c and apply an appropriate clean_corpus function. However, once organized into a TDM you should explicitly define the column names by using colnames and passing in a vector of names. Shown here, it's "coffee", then "chardonnay". Lastly, use as-matrix to convert the object and pass it to the comparison-cloud function with some aesthetics. The comparison-cloud function will identify the words that are dissimilar, Term1 and Term_N, to make the plot.

6. Pyramid plots

Another way to visualize the conjunction of two corpora is with a polarized tag plot. The pyramid-plot from the plotrix library will make one. Same as before, you paste, collapse, concatenate and clean before making a TDM. Then convert it to a matrix. However, instead of simply passing the matrix to a word cloud function, you use subset to identify the terms that both documents share. Then you calculate the absolute difference between the common words using abs and minus. Next steps include ordering by the difference and making a small top terms data frame.

7. Pyramid plots

This top25_df is then passed into the pyramid-dot-plot function along with some aesthetics.

8. Word networks

The last plot is a word network. These plots treat words like a social network so you can visualize word relationships. The technical explanations of making an adjacency matrix to calculate the edges and nodes is best left to another course on social network analysis. Luckily for the text miner, the qdap library has a shortcut function called word_associate. In it, you specify the text column. Then pass in a match-string word along with stop words and aesthetics. This will get you a basic word network very quickly as the function handles basic preprocessing steps and constructing the adjacency matrix of connections between words.

9. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.