CommencerCommencer gratuitement

Adding pipes in spaCy

You often use an existing spaCy model for different NLP tasks. However, in some cases, an off-the-shelf pipeline component such as sentence segmentation will take long times to produce expected results. In this exercise, you'll practice adding a pipeline component to a spaCy model (text processing pipeline).

You will use the first five reviews from the Amazon Fine Food Reviews dataset for this exercise. You can access these reviews by using the texts string.

The spaCy package is already imported for you to use.

Cet exercice fait partie du cours

Natural Language Processing with spaCy

Afficher le cours

Instructions

  • Load a blank spaCy English model and add a sentencizer component to the model.
  • Create a Doc container for the texts, create a list to store sentences of the given document and print its number of sentences.
  • Print the list of tokens in the second sentence from the sentences list.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Load a blank spaCy English model and add a sentencizer component
nlp = spacy.____("en")
nlp.____("sentencizer")

# Create Doc containers, store sentences and print its number of sentences
doc = ____
sentences = [____ for s in ____]
print("Number of sentences: ", len(____), "\n")

# Print the list of tokens in the second sentence
print("Second sentence tokens: ", [____ for ____ in sentences[1]])
Modifier et exécuter le code