Cleaning with qdap
The qdap
package offers other text cleaning functions. Each is useful in its own way and is particularly powerful when combined with the others.
bracketX()
: Remove all text within brackets (e.g. "It's (so) cool" becomes "It's cool")replace_number()
: Replace numbers with their word equivalents (e.g. "2" becomes "two")replace_abbreviation()
: Replace abbreviations with their full text equivalents (e.g. "Sr" becomes "Senior")replace_contraction()
: Convert contractions back to their base words (e.g. "shouldn't" becomes "should not")replace_symbol()
Replace common symbols with their word equivalents (e.g. "$" becomes "dollar")
This exercise is part of the course
Text Mining with Bag-of-Words in R
Exercise instructions
Apply the following functions to the text
object from the previous exercise:
bracketX()
replace_number()
replace_abbreviation()
replace_contraction()
replace_symbol()
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
## text is still loaded in your workspace
# Remove text within brackets
___
# Replace numbers with words
___
# Replace abbreviations
___
# Replace contractions
___
# Replace symbols with words
___