Quick taste of text mining
Sometimes we can find out the author's intent and main ideas just by looking at the most common words.
At its heart, bag of words text mining represents a way to count terms, or n-grams, across a collection of documents. Consider the following sentences, which we've saved to text
and made available in your workspace:
text <- "Text mining usually involves the process of structuring the input text. The overarching goal is, essentially, to turn text into data for analysis, via the application of natural language processing (NLP) and analytical methods."
Manually counting the words in the sentences above is a pain! Fortunately, the qdap
package offers a better alternative. You can easily find the top 4 most frequent terms (including ties) in text
by calling the freq_terms
function and specifying 4
.
frequent_terms <- freq_terms(text, 4)
The frequent_terms
object stores all unique words and their count. You can then make a bar chart simply by calling the plot
function on the frequent_terms
object.
plot(frequent_terms)
This exercise is part of the course
Text Mining with Bag-of-Words in R
Exercise instructions
We've created an object in your workspace called new_text
containing several sentences.
- Load the
qdap
package. - Print
new_text
to the console. - Create
term_count
consisting of the 10 most frequent terms innew_text
. - Plot a bar chart with the results of
term_count
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load qdap
___
# Print new_text to the console
new_text
# Find the 10 most frequent terms: term_count
term_count <- ___
# Plot term_count
___