Since text is unstructured data, a certain amount of wrangling is required to get it into a form where you can analyze it. In this chapter, you will learn how to add structure to text by tokenizing, cleaning, and treating text as categorical data.

Text as data

Airline tweets data

Grouped summaries

Counting categorical data

Counting user types

Summarizing user types

Tokenizing and cleaning

Tokenizing and counting

Cleaning and counting

Wrangling Text

While counts are nice, visualizations are better. In this chapter, you will learn how to apply what you know from ggplot2 to tidy text data.

Plotting word counts

Visualizing complaints

Visualizing non-complaints

Improving word count plots

Adding custom stop words

Visualizing word counts using factors

Faceting word count plots

Counting by product and reordering

Visualizing word counts with facets

Plotting word clouds

Creating a word cloud

Adding a splash of color

Visualizing Text

While word counts and visualizations suggest something about the content, we can do more. In this chapter, we move beyond word counts alone to analyze the sentiment or emotional valence of text.

Sentiment dictionaries

Counting the NRC sentiments

Visualizing the NRC sentiments

Appending dictionaries

Counting sentiment

Visualizing sentiment

Improving sentiment analysis

Practicing reshaping data

Practicing with grouped summaries

Visualizing sentiment by complaint type

Sentiment Analysis

In this final chapter, we move beyond word counts to uncover the underlying topics in a collection of documents. We will use a standard topic model known as latent Dirichlet allocation.

Latent Dirichlet allocation

Topics as word probabilities

Summarizing topics

Visualizing topics

Document term matrices

Creating a DTM

Evaluating a DTM as a matrix

Running topic models

Fitting an LDA

Tidying LDA output

Comparing LDA output

Interpreting topics

Naming three topics

Naming four topics

Wrap-up

Topic Modeling

Airline tweets

Roomba reviews

From social media to product reviews, text is an increasingly important type of data across applications, including marketing analytics. In many instances, text is replacing other forms of unstructured data due to how inexpensive and current it is. However, to take advantage of everything that text has to offer, you need to know how to think about, clean, summarize, and model text. In this course, you will use the latest tidy tools to quickly and easily get started with text. You will learn how to wrangle and visualize text, perform sentiment analysis, and run and interpret topic models.

Introduction to the Tidyverse

Find out how to analyze text data using the tidy framework in R. You'll learn to perform sentiment analysis, topic modeling and wrangle and visualize text.

Introduction to Text Analysis in R

Analyze text data in R using the tidy framework.

Marketing Analytics in R

Text Mining in R

Grouped summaries

Introduction to Text Analysis in R

Exercise instructions

Hands-on interactive exercise