LoslegenKostenlos loslegen

Adding pipes in spaCy

You often use an existing spaCy model for different NLP tasks. However, in some cases, an off-the-shelf pipeline component such as sentence segmentation will take long times to produce expected results. In this exercise, you'll practice adding a pipeline component to a spaCy model (text processing pipeline).

You will use the first five reviews from the Amazon Fine Food Reviews dataset for this exercise. You can access these reviews by using the texts string.

The spaCy package is already imported for you to use.

Diese Übung ist Teil des Kurses

Natural Language Processing with spaCy

Kurs anzeigen

Anleitung zur Übung

  • Load a blank spaCy English model and add a sentencizer component to the model.
  • Create a Doc container for the texts, create a list to store sentences of the given document and print its number of sentences.
  • Print the list of tokens in the second sentence from the sentences list.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Load a blank spaCy English model and add a sentencizer component
nlp = spacy.____("en")
nlp.____("sentencizer")

# Create Doc containers, store sentences and print its number of sentences
doc = ____
sentences = [____ for s in ____]
print("Number of sentences: ", len(____), "\n")

# Print the list of tokens in the second sentence
print("Second sentence tokens: ", [____ for ____ in sentences[1]])
Code bearbeiten und ausführen