Get startedGet started for free

Named entity recognition on transcribed text

1. Named entity recognition on transcribed text

Now you've done some sentiment analysis on Acme's transcribed calls, you decide named entity recognition is a good next step. Entity recognition is the process of extracting objects of interest from text. To do this, you turn to spaCy, the natural language processing library.

2. Installing spaCy

To get started with spaCy, you can install it using pip. Once spaCy is installed, we can use spaCy's built-in language models for natural language processing by downloading them using the spacy download command on the command line.

3. Using spaCy

spaCy works by turning blocks of text into docs. Docs are made up of tokens and spans. You can think of tokens as individual words and groups of tokens or sentences as spans. Let's see. First we import spacy. Then we load the language model and save it to the conventional variable nlp. Then to create a spaCy doc, we pass the string of text we want to use to nlp. Now we've got a spaCy doc, we can use spaCy's built-in features to find out more.

4. spaCy tokens

You can see what tokens a doc contains and the index where they start using dot text and dot idx on objects in your doc. The number returned by idx indicates the index of the first letter in the token.

5. spaCy sentences

You can see where the sentences are with dot sents. Here spaCy has broken the text in our doc into sentences.

6. spaCy named entities

Beautiful, now let's try using spaCy's named entity recognition. A named entity is an object which is given a name, such as, a person, product, location or date. spaCy has several of these named entities built-in it can recognize straight away.

7. spaCy named entities

You can access the named entities in a doc using dot ents. Let's try. dot text shows us the token that the label belongs to. And dot label underscore gives us the named entity label of the text. You can see Sydney is given GPE for geopolitical entity.

8. Custom named entities

spaCy's built-in named entities are excellent but depending your problem, you'll probably want to develop some of your own. Since Acme is a technology company, you decide it's a good idea to create a custom entity recognizer for their products. To do so, you can use spaCy's pipeline class EntityRuler. A pipeline is what spaCy uses to parse text into a doc. You can see the current pipeline you're using by calling pipeline on nlp. In our case, our pipeline has three steps, a tagger, a parser and ner for named entity recognition.

9. Changing the pipeline

The EntityRuler class allows us to create another step in the pipeline. We start by making an instance of EntityRuler called ruler, passing it nlp. Then we use add patterns to add the token pattern we'd like spaCy to consider an entity. In our case, we want the smartphone token to have the entity label PRODUCT. We can add this rule to the pipeline before ner so we can be sure it gets used.

10. Changing the pipeline

Now when we check our pipeline we've got a new step called entity ruler.

11. Testing the new pipeline

Let's try it with our doc from before. You can see the token smartphone now has the PRODUCT named entity label.

12. Let's rocket and practice spaCy!

Woah, we covered a lot of ground in this lesson. Let's make it happen!