Get startedGet started for free

Step 3: Text organization

1. Step 3: Text organization

In step 3, you organize your text which is when you first clean it.

2. Text organization with qdap

In this chapter we separate qdap functions into a custom qdap_clean function. qdap functions can be applied directly to a text vector, rather than a corpus object. In the qdap_clean function, x is a vector of employee reviews and the first preprocessing step uses replace_abbreviation, then replace_contraction, and so on.

3. Text organization with tm

For the tm library, we have a slightly more familiar cleaning function tm_clean. This function takes a VCorpus to first removePunctuation, stripWhitespace and finally removeWords, including common English and custom terms "Google", "amazon", and "company".

4. Cleaning your corpora

In this section you will clean 4 distinct corpora. To make your corpora you apply the cleaning functions to Amazon pros and cons reviews. Then you'll work on Google pros and cons.

5. Let's practice!