Aan de slagGa gratis aan de slag

Text preprocessing: Stemming

The root of words are often more important than their endings, especially when it comes to text analysis. The book Animal Farm is obviously about animals. However, knowing that the book mentions animal's 248 times, and animal 107 times might not be helpful for your analysis.

tidy_animal_farm contains a tibble of the words from Animal Farm, tokenized and without stop words. The next step is to stem the words and explore the results.

Deze oefening maakt deel uit van de cursus

Introduction to Natural Language Processing in R

Cursus bekijken

Oefeninstructies

  • Use dplyr and SnowballC to stem the words from tidy_animal_farm.
  • Print the old word frequencies from tidy_animal_farm.
  • Print the new word frequencies from stemmed_animal_farm.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Perform stemming on tidy_animal_farm
stemmed_animal_farm <- tidy_animal_farm %>%
  ___(word = ___(___))

# Print the old word frequencies 
___ %>%
  ___(word, sort = ___)

# Print the new word frequencies
___ %>%
  ___(word, sort = ___)
Code bewerken en uitvoeren