Adding pipes in spaCy
You often use an existing spaCy model for different NLP tasks. However, in some cases, an off-the-shelf pipeline component such as sentence segmentation will take long times to produce expected results. In this exercise, you'll practice adding a pipeline component to a spaCy model (text processing pipeline).
You will use the first five reviews from the Amazon Fine Food Reviews dataset for this exercise. You can access these reviews by using the texts
string.
The spaCy
package is already imported for you to use.
Diese Übung ist Teil des Kurses
Natural Language Processing with spaCy
Anleitung zur Übung
- Load a blank
spaCy
English model and add asentencizer
component to the model. - Create a
Doc
container for thetexts
, create a list to storesentences
of the given document and print its number of sentences. - Print the list of tokens in the second sentence from the
sentences
list.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Load a blank spaCy English model and add a sentencizer component
nlp = spacy.____("en")
nlp.____("sentencizer")
# Create Doc containers, store sentences and print its number of sentences
doc = ____
sentences = [____ for s in ____]
print("Number of sentences: ", len(____), "\n")
# Print the list of tokens in the second sentence
print("Second sentence tokens: ", [____ for ____ in sentences[1]])