All about stop words
Often there are words that are frequent but provide little information. These are called stop words, and you may want to remove them from your analysis. Some common English stop words include "I", "she'll", "the", etc. In the tm
package, there are 174 common English stop words (you'll print them in this exercise!)
When you are doing an analysis, you will likely need to add to this list. In our coffee tweet example, all tweets contain "coffee", so it's important to pull out that word in addition to the common stop words. Leaving "coffee" in doesn't add any insight and will cause it to be overemphasized in a frequency analysis.
Using the c()
function allows you to add new words to the stop words list. For example, the following would add "word1" and "word2" to the default list of English stop words:
all_stops <- c("word1", "word2", stopwords("en"))
Once you have a list of stop words that makes sense, you will use the removeWords()
function on your text. removeWords()
takes two arguments: the text
object to which it's being applied and the list of words to remove.
This is a part of the course
“Text Mining with Bag-of-Words in R”
Exercise instructions
- Review standard stop words by calling
stopwords("en")
. - Remove "en" stopwords from
text
. - Add "coffee" and "bean" to the standard stop words, assigning to
new_stops
. - Remove the customized stopwords,
new_stops
, fromtext
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
## text is preloaded into your workspace
# List standard English stop words
___
# Print text without standard stop words
removeWords(___, ___("___"))
# Add "coffee" and "bean" to the list: new_stops
new_stops <- c("___", "___", ___)
# Remove stop words from text
___
This exercise is part of the course
Text Mining with Bag-of-Words in R
Learn the bag of words technique for text mining with R.
In this chapter, you'll learn the basics of using the bag-of-words method for analyzing text data.
Exercise 1: What is text mining?Exercise 2: Understanding text miningExercise 3: Quick taste of text miningExercise 4: Getting startedExercise 5: Load some textExercise 6: Make the vector a VCorpus object (1)Exercise 7: Make the vector a VCorpus object (2)Exercise 8: Make a VCorpus from a data frameExercise 9: Cleaning and preprocessing textExercise 10: Common cleaning functions from tmExercise 11: Cleaning with qdapExercise 12: All about stop wordsExercise 13: Intro to word stemming and stem completionExercise 14: Word stemming and stem completion on a sentenceExercise 15: Apply preprocessing steps to a corpusExercise 16: The TDM & DTMExercise 17: Understanding TDM and DTMExercise 18: Make a document-term matrixExercise 19: Make a term-document matrixWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.