Get startedGet started for free

Visualize popular terms

1. Visualize popular terms

The most frequently used words in tweets are typically popular terms relevant to the topic tweeted.

2. Lesson Overview

In this lesson, we will extract and visualize popular terms from tweets. We will start with the corpus on "Obesity" and extract a list of most frequent terms which is used to remove custom stop words from the corpus. Finally, the popular terms from the refined corpus are visualized using the bar plot and word cloud.

3. Term frequency

The first step is to extract the number of occurrences of each word, called term frequency, using the freq_terms() function. This function takes two arguments: the corpus and the top "n" terms to be extracted based on the number of occurrences.

4. Term frequency

Take a look at the highlighted terms in the output. Terms like "obesity", "s", and "can" do not add value to the corpus on "Obesity" and can be removed as custom stop words.

5. Removing custom stop words

To do this, create a vector of custom stop words. Then apply tm_map() and removeWords() functions to remove the custom stop words. tm_map() takes 3 arguments: the corpus, removeWords(), and the vector of custom stop words. The corpus now has only important terms.

6. Term count after refining corpus

Let's extract and view the term frequency for the top 20 words from this refined corpus.

7. Term frequency after refining corpus

The popular terms related to tweets on "Obesity" can be seen here. A brand promoting an obesity management program can analyze these terms to understand the pulse of the audience.

8. Bar plot of popular terms

Let's create a bar plot of terms that occur more than 50 times using ggplot(). Bar plots help summarize popular terms in an easily interpretable form. The first step is to create a subset data frame of the terms that occur more than 50 times using the subset() function which takes two arguments: the frequent terms list and the condition FREQ column greater than 50.

9. Bar plot of most popular terms

To create the bar plot, ggplot() takes the following arguments: the data frame "term50", under aesthetics, words sorted in descending order of FREQ column for the x-axis and the frequency counts for the y-axis, geom_bar() with the arguments stat set to "identity" and fill set to "blue", theme() with arguments to have the word labels rotated 45 degrees and placed below the x-axis.

10. Bar plot of popular terms

Let's check the bar plot. We can see several words related to obesity in the list. The word health is at the top of the list, indicating the impact of obesity on health.

11. Word cloud

Let's now create another interesting visualization of the frequent terms using word clouds. A word cloud is an image made up of words in which the size of each word indicates its frequency. It is an effective promotional image for marketing campaigns as it communicates the brand messaging and highlights popular terms to convey the value of the content being shared.

12. Word cloud based on min frequency

To create word clouds, we use the wordcloud() function with the following arguments: the corpus, min freq set to include only terms with a minimum frequency, color set to "red", scale set to the range of font sizes, and random order set to FALSE to fix the word pattern.

13. Word cloud based on min frequency

A word cloud highlighting high-frequency words in large font sizes is displayed as output.

14. Colorful word cloud

Let's make the word cloud colorful using the RColorBrewer library. We assign "6" colors from the “Dark2” palette of brewer.pal() and set the max.words argument to "100" to plot a word cloud of the top 100 words.

15. Colorful word cloud

We now have an interesting word cloud depicting popular terms from tweets on obesity.

16. Let's practice!

We visualized the popular terms using the bar plot and word cloud. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.