Get Started

Intro to word stemming and stem completion

Still, another useful preprocessing step involves word-stemming and stem completion. Word stemming reduces words to unify across documents. For example, the stem of "computational", "computers" and "computation" is "comput". But because "comput" isn't a real word, we want to reconstruct the words so that "computational", "computers", and "computation" all refer to a recognizable word, such as "computer". The reconstruction step is called stem completion.

The tm package provides the stemDocument() function to get to a word's root. This function either takes in a character vector and returns a character vector, or takes in a PlainTextDocument and returns a PlainTextDocument.

For example,

stemDocument(c("computational", "computers", "computation"))

returns "comput" "comput" "comput".

You will use stemCompletion() to reconstruct these word roots back into a known term. stemCompletion() accepts a character vector and a completion dictionary. The completion dictionary can be a character vector or a Corpus object. Either way, the completion dictionary for our example would need to contain the word "computer," so all instances of "comput" can be reconstructed.

This is a part of the course

“Text Mining with Bag-of-Words in R”

View Course

Exercise instructions

  • Create a vector called complicate consisting of the words "complicated", "complication", and "complicatedly" in that order.
  • Store the stemmed version of complicate to an object called stem_doc.
  • Create comp_dict that contains one word, "complicate".
  • Create complete_text by applying stemCompletion() to stem_doc. Re-complete the words using comp_dict as the reference corpus.
  • Print complete_text to the console.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create complicate
complicate <- ___

# Perform word stemming: stem_doc
stem_doc <- ___

# Create the completion dictionary: comp_dict
comp_dict <- ___

# Perform stem completion: complete_text 
complete_text <- ___

# Print complete_text
complete_text

This exercise is part of the course

Text Mining with Bag-of-Words in R

IntermediateSkill Level
5.0+
7 reviews

Learn the bag of words technique for text mining with R.

In this chapter, you'll learn the basics of using the bag-of-words method for analyzing text data.

Exercise 1: What is text mining?Exercise 2: Understanding text miningExercise 3: Quick taste of text miningExercise 4: Getting startedExercise 5: Load some textExercise 6: Make the vector a VCorpus object (1)Exercise 7: Make the vector a VCorpus object (2)Exercise 8: Make a VCorpus from a data frameExercise 9: Cleaning and preprocessing textExercise 10: Common cleaning functions from tmExercise 11: Cleaning with qdapExercise 12: All about stop wordsExercise 13: Intro to word stemming and stem completion
Exercise 14: Word stemming and stem completion on a sentenceExercise 15: Apply preprocessing steps to a corpusExercise 16: The TDM & DTMExercise 17: Understanding TDM and DTMExercise 18: Make a document-term matrixExercise 19: Make a term-document matrix

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free