Get startedGet started for free

Refresher on the text mining workflow

1. Refresher on the text mining workflow

The last chapter! Thanks for sticking with it this far!

2. So far ...

In chapter 1 you learned about qdap’s polarity() function, a basic subjectivity lexicon and valence shifters. Next you learned a bit about the tidyverse while doing inner joins with 3 subjectivity lexicons. In the last chapter you did a bunch more sentiment analysis but added a layer of code for creating visuals.

3. Case study

In this last chapter you apply your new skills to a real case study. Specifically you have a posh apartment in Boston and you are thinking about renting it out on a popular sharing platform. Before doing so you want to know what makes for a good rental experience.

4. The text mining workflow

Remember the text mining workflow from the bag of words course? In it you learned about this mental map demonstrating how text mining is the process of going from an unorganized state to an organized state. The sentiment analyses you learned in this course are techniques aligning to feature extraction in this workflow.

5. 6 defined steps

More concretely the text mining workflow encompasses six steps. First is defining the problem and specific goals of the project. Without it you could disappoint yourself or your stakeholders. Two, identify the text to be analyzed. There is so much text you need to clearly frame your source and ensure you understand the way language is used there. For example legal documents use different terms than Twitter which is altogether different than Wikipedia pages and so on. Third, organize the text. In the previous text mining course you used a TDM or DTM. In this course you analyzed vectors, TDMs and also added the tidy data object called a tibble. Fourth, you extract features about the text. This is the step in which you examine the sentiment and polarity of the text. The identified words from the subjectivity lexicons and polarity scores are features you learned from the data. Fifth, you are analyzing the data artifacts like visuals, frequency values or summary statistics of the extracted features. Lastly, tie the entire process back together with the original problem statement. Following these steps will improve your chances of a successful text mining project including sentiment analysis.

6. Step 1: Define your problem

The first step is to define your problem statement or what insights you hope to gain from the text mining project. In the next exercise you will be asked to identify an appropriate problem statement. Keep in mind you want to know about rental properties to see how your apartment measures up.

7. Step 2: ID your text

Step 2 is to identify your text sources. You could use any free form text that you think may aid you in answering your problem statement. However, we’ve loaded a thousand property rentals for you. All you need to do is load them and examine them. The last exercise of this section you quickly perform a polarity score on the rental reviews and make a plot. This can help you understand generalities about the text you are going to analyze further.

8. Let's practice!