Sentence segmentation with spaCy
In this exercise, you will practice sentence segmentation. In NLP, segmenting a document into its sentences is a useful basic operation. It is one of the first steps in many NLP tasks that are more elaborate, such as detecting named entities. Additionally, capturing the number of sentences may provide some insight into the amount of information provided by the text.
You can access ten food reviews in the list called texts.
The en_core_web_sm model has already been loaded for you as nlp and .
This exercise is part of the course
Natural Language Processing with spaCy
Exercise instructions
- Run the
spaCymodel on each item in thetextslist to compiledocuments, a list of allDoccontainers. - Extract sentences of each
doccontainer by iterating throughdocumentslist and append them to a list calledsentences. - Count the number of sentences in each
doccontainer using thesentenceslist.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Generating a documents list of all Doc containers
documents = [____(text) for text in texts]
# Iterate through documents and append sentences in each doc to the sentences list
sentences = []
for doc in documents:
sentences.append([s for s in ____.____])
# Find number of sentences per each doc container
print([len(____) for s in sentences])