In this chapter, you'll learn the basics of using the bag-of-words method for analyzing text data.

What is text mining?

Understanding text mining

Quick taste of text mining

Getting started

Load some text

Make the vector a VCorpus object (1)

Make the vector a VCorpus object (2)

Make a VCorpus from a data frame

Cleaning and preprocessing text

Common cleaning functions from tm

Cleaning with qdap

All about stop words

Intro to word stemming and stem completion

Word stemming and stem completion on a sentence

Apply preprocessing steps to a corpus

The TDM & DTM

Understanding TDM and DTM

Make a document-term matrix

Make a term-document matrix

Jumping into Text Mining with Bag-of-Words

This chapter will teach you how to visualize text data in a way that's both informative and engaging.

Common text mining visuals

Test your understanding of text mining

Frequent terms with tm

Frequent terms with qdap

Intro to word clouds

A simple word cloud

Stop words and word clouds

Plot the better word cloud

Improve word cloud colors

Use prebuilt color palettes

Other word clouds and word networks

Find common words

Visualize common words

Visualize dissimilar words

Polarized tag cloud

Visualize word networks

Teaser: simple word clustering

Word Clouds and More Interesting Visuals

In this chapter, you'll learn more basic text mining techniques based on the bag of words method.

Simple word clustering

Distance matrix and dendrogram

Make a dendrogram friendly TDM

Put it all together: a text-based dendrogram

Dendrogram aesthetics

Using word association

Getting past single words

N-gram tokenization

Changing n-grams

How do bigrams affect word clouds?

Different frequency criteria

Changing frequency weights

Capturing metadata in tm

Adding to Your TM Skills

This chapter ties everything together with a case study in text mining for HR analytics.

Amazon vs. Google

Organizing a text mining project

Step 1: Problem definition

Step 2: Identifying the text sources

Step 3: Text organization

Text organization

Working with Google reviews

Steps 4 & 5: Feature extraction & analysis

Feature extraction & analysis: amzn_pros

Feature extraction & analysis: amzn_cons

amzn_cons dendrogram

Word association

Quick review of Google reviews

Cage match! Amazon vs. Google pro reviews

Cage match, part 2! Negative reviews

Step 6: Reach a conclusion

Draw conclusions, insights, or recommendations

Draw another conclusion, insight, or recommendation

Finished!

Battle of the Tech Giants for Talent

Coffee tweets

Chardonnay tweets

Anonymous online reviews: Amazon

Anonymous online reviews: Google

It is estimated that over 70% of potentially usable business information is unstructured, often in the form of text data. Text mining provides a collection of techniques that allows us to derive actionable insights from unstructured data. In this course, we explore the basics of text mining using the bag of words method. The first three chapters introduce a variety of essential topics for analyzing and visualizing text data. The final chapter allows you to apply everything you've learned in a real-world case study to extract insights from employee reviews of two major tech companies.

Intermediate R

Discover the basics of text mining using the bag of words method and a variety of other essential topics for analyzing and visualizing text data in R.

Text Mining with Bag-of-Words in R

Learn the bag of words technique for text mining with R.

Text Mining in R

Likely to Recommend

Common cleaning functions from tm

“Text Mining with Bag-of-Words in R”

Exercise instructions

Hands-on interactive exercise

Text Mining with Bag-of-Words in R

Chapter 1: Jumping into Text Mining with Bag-of-Words

Chapter 2: Word Clouds and More Interesting Visuals

Chapter 3: Adding to Your TM Skills

Chapter 4: Battle of the Tech Giants for Talent

What is DataCamp?