Text processing with spaCy

Every NLP application consists of several text processing steps. You have already learned some of these steps, including tokenization, lemmatization, sentence segmentation and named entity recognition.

spaCy NLP Pipeline

In this exercise, you'll continue to practice with text processing steps in spaCy, such as breaking the text into sentences and extracting named entities. You will use the first five reviews from the Amazon Fine Food Reviews dataset for this exercise. You can access these reviews by using the texts object.

The en_core_web_sm model has already been loaded for you to use, and you can access it by using nlp. The list of Doc containers for each item in texts is also pre-loaded and accessible at documents.

Create sentences, a list of list of all sentences in each doc container in documents using list comprehension.
Print num_sentences, a list containing the number of sentences for each doc container by using the len() method.

Introduction to NLP and spaCy

spaCy Linguistic Annotations and Word Vectors

Data Analysis with spaCy

Customizing spaCy Models

Exercise

Exercise

Text processing with spaCy

Instructions 1/2