Quick taste of text mining

Sometimes we can find out the author's intent and main ideas just by looking at the most common words.

At its heart, bag of words text mining represents a way to count terms, or n-grams, across a collection of documents. Consider the following sentences, which we've saved to text and made available in your workspace:

text <- "Text mining usually involves the process of structuring the input text. The overarching goal is, essentially, to turn text into data for analysis, via the application of natural language processing (NLP) and analytical methods."

Manually counting the words in the sentences above is a pain! Fortunately, the qdap package offers a better alternative. You can easily find the top 4 most frequent terms (including ties) in text by calling the freq_terms function and specifying 4.

frequent_terms <- freq_terms(text, 4)

The frequent_terms object stores all unique words and their count. You can then make a bar chart simply by calling the plot function on the frequent_terms object.

plot(frequent_terms)

We've created an object in your workspace called new_text containing several sentences.

Load the qdap package.
Print new_text to the console.
Create term_count consisting of the 10 most frequent terms in new_text.
Plot a bar chart with the results of term_count.

Jumping into Text Mining with Bag-of-Words

Word Clouds and More Interesting Visuals

Adding to Your TM Skills

Battle of the Tech Giants for Talent

Exercise

Quick taste of text mining

Instructions