Get startedGet started for free

Quick taste of text mining

Sometimes we can find out the author's intent and main ideas just by looking at the most common words.

At its heart, bag of words text mining represents a way to count terms, or n-grams, across a collection of documents. Consider the following sentences, which we've saved to text and made available in your workspace:

text <- "Text mining usually involves the process of structuring the input text. The overarching goal is, essentially, to turn text into data for analysis, via the application of natural language processing (NLP) and analytical methods."

Manually counting the words in the sentences above is a pain! Fortunately, the qdap package offers a better alternative. You can easily find the top 4 most frequent terms (including ties) in text by calling the freq_terms function and specifying 4.

frequent_terms <- freq_terms(text, 4)

The frequent_terms object stores all unique words and their count. You can then make a bar chart simply by calling the plot function on the frequent_terms object.

plot(frequent_terms)

This exercise is part of the course

Text Mining with Bag-of-Words in R

View Course

Exercise instructions

We've created an object in your workspace called new_text containing several sentences.

  • Load the qdap package.
  • Print new_text to the console.
  • Create term_count consisting of the 10 most frequent terms in new_text.
  • Plot a bar chart with the results of term_count.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Load qdap
___

# Print new_text to the console
new_text

# Find the 10 most frequent terms: term_count
term_count <- ___

# Plot term_count
___
Edit and Run Code