Adding pipes in spaCy
You often use an existing spaCy model for different NLP tasks. However, in some cases, an off-the-shelf pipeline component such as sentence segmentation will take long times to produce expected results. In this exercise, you'll practice adding a pipeline component to a spaCy model (text processing pipeline).
You will use the first five reviews from the Amazon Fine Food Reviews dataset for this exercise. You can access these reviews by using the texts string.
The spaCy package is already imported for you to use.
Diese Übung ist Teil des Kurses
Natural Language Processing with spaCy
Anleitung zur Übung
- Load a blank
spaCyEnglish model and add asentencizercomponent to the model. - Create a
Doccontainer for thetexts, create a list to storesentencesof the given document and print its number of sentences. - Print the list of tokens in the second sentence from the
sentenceslist.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Load a blank spaCy English model and add a sentencizer component
nlp = spacy.____("en")
nlp.____("sentencizer")
# Create Doc containers, store sentences and print its number of sentences
doc = ____
sentences = [____ for s in ____]
print("Number of sentences: ", len(____), "\n")
# Print the list of tokens in the second sentence
print("Second sentence tokens: ", [____ for ____ in sentences[1]])