ComenzarEmpieza gratis

Intro to word stemming and stem completion

Still, another useful preprocessing step involves word-stemming and stem completion. Word stemming reduces words to unify across documents. For example, the stem of "computational", "computers" and "computation" is "comput". But because "comput" isn't a real word, we want to reconstruct the words so that "computational", "computers", and "computation" all refer to a recognizable word, such as "computer". The reconstruction step is called stem completion.

The tm package provides the stemDocument() function to get to a word's root. This function either takes in a character vector and returns a character vector, or takes in a PlainTextDocument and returns a PlainTextDocument.

For example,

stemDocument(c("computational", "computers", "computation"))

returns "comput" "comput" "comput".

You will use stemCompletion() to reconstruct these word roots back into a known term. stemCompletion() accepts a character vector and a completion dictionary. The completion dictionary can be a character vector or a Corpus object. Either way, the completion dictionary for our example would need to contain the word "computer," so all instances of "comput" can be reconstructed.

Este ejercicio forma parte del curso

Text Mining with Bag-of-Words in R

Ver curso

Instrucciones del ejercicio

  • Create a vector called complicate consisting of the words "complicated", "complication", and "complicatedly" in that order.
  • Store the stemmed version of complicate to an object called stem_doc.
  • Create comp_dict that contains one word, "complicate".
  • Create complete_text by applying stemCompletion() to stem_doc. Re-complete the words using comp_dict as the reference corpus.
  • Print complete_text to the console.

Ejercicio interactivo práctico

Prueba este ejercicio completando el código de muestra.

# Create complicate
complicate <- ___

# Perform word stemming: stem_doc
stem_doc <- ___

# Create the completion dictionary: comp_dict
comp_dict <- ___

# Perform stem completion: complete_text 
complete_text <- ___

# Print complete_text
complete_text
Editar y ejecutar código