LoslegenKostenlos loslegen

Lemmatization with spaCy

In this exercise, you will practice lemmatization. Lemmatization can be helpful to generate the root form of derived words. This means that given any sentence, we expect the number of lemmas to be less than or equal to the number of tokens.

The first Amazon food review is provided for you in a string called text. en_core_web_sm is loaded as nlp, and has been run on the text to compile document, a Doc container for the text string.

tokens, a list containing tokens for the text is also already loaded for your use.

Diese Übung ist Teil des Kurses

Natural Language Processing with spaCy

Kurs anzeigen

Anleitung zur Übung

  • Append the lemma for all tokens in the document, then print the list of lemmas.
  • Print tokens list and observe the differences between tokens and lemmas.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

document = nlp(text)
tokens = [token.text for token in document]

# Append the lemma for all tokens in the document
lemmas = [token.____ for token in document]
print("Lemmas:\n", ____, "\n")

# Print tokens and compare with lemmas list
print("Tokens:\n", ____)
Code bearbeiten und ausführen