Sentence segmentation with spaCy
In this exercise, you will practice sentence segmentation. In NLP, segmenting a document into its sentences is a useful basic operation. It is one of the first steps in many NLP tasks that are more elaborate, such as detecting named entities. Additionally, capturing the number of sentences may provide some insight into the amount of information provided by the text.
You can access ten food reviews in the list called texts
.
The en_core_web_sm
model has already been loaded for you as nlp
and .
This exercise is part of the course
Natural Language Processing with spaCy
Exercise instructions
- Run the
spaCy
model on each item in thetexts
list to compiledocuments
, a list of allDoc
containers. - Extract sentences of each
doc
container by iterating throughdocuments
list and append them to a list calledsentences
. - Count the number of sentences in each
doc
container using thesentences
list.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Generating a documents list of all Doc containers
documents = [____(text) for text in texts]
# Iterate through documents and append sentences in each doc to the sentences list
sentences = []
for doc in documents:
sentences.append([s for s in ____.____])
# Find number of sentences per each doc container
print([len(____) for s in sentences])