Lemmatization with spaCy
In this exercise, you will practice lemmatization. Lemmatization can be helpful to generate the root form of derived words. This means that given any sentence, we expect the number of lemmas to be less than or equal to the number of tokens.
The first Amazon food review is provided for you in a string called text
. en_core_web_sm
is loaded as nlp
, and has been run on the text
to compile document
, a Doc
container for the text string.
tokens
, a list containing tokens for the text
is also already loaded for your use.
Este ejercicio forma parte del curso
Natural Language Processing with spaCy
Instrucciones del ejercicio
- Append the lemma for all tokens in the
document
, then print the list oflemmas
. - Print
tokens
list and observe the differences betweentokens
andlemmas
.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
document = nlp(text)
tokens = [token.text for token in document]
# Append the lemma for all tokens in the document
lemmas = [token.____ for token in document]
print("Lemmas:\n", ____, "\n")
# Print tokens and compare with lemmas list
print("Tokens:\n", ____)