1. Named entity recognition
The final technique
we will learn as part of this chapter is named entity recognition.
2. Applications
Named entity recognition or NER has a host of extremely useful applications. It is used to build efficient search algorithms
and question answering systems.
For instance, let us say you have a piece of text and you ask your system about the people that are being talked about in the text. NER would help the system in answering this question by identifying all the entities that refer to a person in the text. NER also found application with News Providers who use it to categorize
their articles and Customer Service centers
who use it to classify and record their complaints efficiently.
3. Named entity recognition
Let us now get down to the definitions. A named entity is anything that can be denoted with a proper name or a proper noun. Named entity recognition or NER, therefore, is the process of identifying such named entities
in a piece of text and classifying them into predefined categories
such as person, organization, country, etc. For example, consider the text
"John Doe is a software engineer working at Google. He lives in France." Performing NER on this text will tell us that there are three named entities:
John Doe,
who is a person, Google,
which is an organization and France,
which is a country (or geopolitical entity)
4. NER using spaCy
Like POS tagging, performing NER is extremely easy using spaCy's pre-trained models. Let's try to find the named entities in the same sentence we used earlier. As usual, we import the spacy library,
load the required model and create a Doc object for the string. When we do this, spaCy automatically computes all the named entities and makes it available as the ents attribute of doc. Therefore, to access the named entity and its category, we use list comprehension
to loop over doc.ents and create a tuple containing the entity name, which is accessed using ent.text, and entity category, which is accessed using ent.label_. Printing this list
out will give the following output. We see that spaCy has correctly identified and classified all the named entities in this string.
5. NER annotations in spaCy
Currently, spaCy's models are capable of identifying more than 15 different types
of named entities. The complete list of categories and their annotations can be found
in spaCy's documentatiion. Here
is a snapshot of the page.
6. A word of caution
In this chapter, we have used spacy's models to accomplish several tasks. However, remember that spacy's models
are not perfect and its performance depends on the data
it was trained with and the data it is being used on. For instance, if we are trying extract named entities for texts from a heavily technical field, such as medicine, spacy's pretrained models may not perform such a great job. In such nuanced cases,
it is better to train your models with your specialized data. Also, remember that spacy's models are language specific.
This is understandable considering that each language has its own grammar and nuances. The en_core_web_sm model that we've been using is, as the name suggests, only suitable for English texts.
7. Let's practice!
This concludes our lesson on named entity recognition. Let us practice our understanding of this technique in the exercises.