Sentence segmentation with spaCy
In this exercise, you will practice sentence segmentation. In NLP, segmenting a document into its sentences is a useful basic operation. It is one of the first steps in many NLP tasks that are more elaborate, such as detecting named entities. Additionally, capturing the number of sentences may provide some insight into the amount of information provided by the text.
You can access ten food reviews in the list called texts.
The en_core_web_sm model has already been loaded for you as nlp and .
Cet exercice fait partie du cours
Natural Language Processing with spaCy
Instructions
- Run the
spaCymodel on each item in thetextslist to compiledocuments, a list of allDoccontainers. - Extract sentences of each
doccontainer by iterating throughdocumentslist and append them to a list calledsentences. - Count the number of sentences in each
doccontainer using thesentenceslist.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Generating a documents list of all Doc containers
documents = [____(text) for text in texts]
# Iterate through documents and append sentences in each doc to the sentences list
sentences = []
for doc in documents:
sentences.append([s for s in ____.____])
# Find number of sentences per each doc container
print([len(____) for s in sentences])