Get startedGet started for free

Step 3: Organize (& clean) the text

1. Step 3: Organize (& clean) the text

In this course you have applied functions to both the term document matrix and the tidy text tibble format. So in step 3 of the text mining workflow, organizing and cleaning, you will create both data types and perform a simple polarity scoring on each. When I do sentiment analysis the polarity scoring is something I almost always start with as I get familiar with the data.

2. Get to it!

In the first exercise you will apply qdap’s polarity function and add it as a column to the rental reviews text vector. Then you will subset the text into positive or negative reviews. These subsections will be collapsed into 2 large documents same as in chapter 3. You will close out the first exercise by creating a term document matrix with minimal preprocessing.

3. More organization

The second exercise you will create the tidy text data format. Specifically you will apply the unnest_tokens() function to the Boston rental reviews. Since this is a tidy exercise you will use the pipe operator to forward objects to each function. You will also add a column with mutate() and sequence along to capture the word order. Depending on your analysis this may be information you want to retain so I almost always add this column during a tidy text analysis. The second exercise ends with you removing stopwords through an anti join. This should be old hat by now!

4. Tidy text polarity scoring

Step 3 will close out with a simple polarity scoring using the tidy text sentiments data frame. To begin you will call the data from the package. Next subset the tidy data of lexicons to Bing. Then using some of the example code you will fill in the code to perform the inner join and finally mutate a new column where the difference between positive and negative is calculated. In the end you will have a qdap polarity based object and the tidy polarity scores using an inner join for comparison in another object. While they are not going to be the same you may find out that they are aligned in one direction.

5. Let's practice!

Good luck in the exercises!