Aan de slagGa gratis aan de slag

Remove stop words and additional spaces

A corpus of text usually contains many common words like "a", "an", "the", "of", and "but". In natural language processing, these are called stop words.

Stop words are usually removed during text processing so one can focus on more important words in the corpus to derive insights.

Also, the additional spaces created during the removal of special characters, punctuation, numbers, and stop words need to be removed from the corpus.

The corpus that you created in the last exercise has been pre-loaded as twt_corpus_lwr.

The library tm has been pre-loaded for this exercise.

Deze oefening maakt deel uit van de cursus

Analyzing Social Media Data in R

Cursus bekijken

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Remove English stop words from the corpus and view the corpus 
twt_corpus_stpwd <- ___(twt_corpus_lwr, ___, stopwords("___"))
head(twt_corpus_stpwd$content)
Code bewerken en uitvoeren