Get startedGet started for free

Named Entity Recognition

1. Named Entity Recognition

Welcome to our first video on named entity recognition!

2. What is Named Entity Recognition?

Named Entity Recognition or NER for short is a natural language processing task used to identify important named entities in the text -- such as people, places and organizations -- they can even be dates, states, works of art and other categories depending on the libraries and notation you use. NER can be used alongside topic identification, or on its own to determine important items in a text or answer basic natural language understanding questions such as who? what? when and where?

3. Example of NER

For example, take this piece of text which is from the English Wikipedia article on Albert Einstein. The text has been highlighted for different types of named entities that were found using the Stanford NER library. You can see the dates, locations, persons and organizations found and extract infomation on the text based on these named entities. You can use NER to solve problems like fact extraction as well as which entities are related using computational language models. For example, in this text we can see that Einstein has something to do with the United States, Adolf Hitler and Germany. We can also see by token proximity that Betrand Russel and Einstein created the Russel-Einstein manifesto -- all from simple entity highlighting.

4. nltk and the Stanford CoreNLP Library

NLTK allows you to interact with named entity recognition via it's own model, but also the aforementioned Stanford library. The Stanford library integration requires you to perform a few steps before you can use it, including installing the required Java files and setting system environment variables. You can also use the standford library on its own without integrating it with NLTK or operate it as an API server. The stanford CoreNLP library has great support for named entity recognition as well as some related nlp tasks such as coreference (or linking pronouns and entities together) and dependency trees to help with parsing meaning and relationships amongst words or phrases in a sentence.

5. Using nltk for Named Entity Recognition

For our simple use case, we will use the built-in named entity recognition with NLTK. To do so, we take a normal sentence, and preprocess it via tokenization. Then, we can tag the sentence for parts of speech. This will add tags for proper nouns, pronouns, adjective, verbs and other part of speech that NLTK uses based on an english grammar. When we take a look at the tags, we see New and York are tagged NNP which is the tag for a proper noun, singular.

6. nltk's ne_chunk()

Then we pass this tagged sentence into the ne_chunk function, or named entity chunk, which will return the sentence as a tree. NLTK Tree's might look a bit different than trees you might use in other libraries, but they do have leaves and subtrees representing more complex grammar. This tree shows the named entities tagged as their own chunks such as GPE or geopolitical entity for New York, or MOMA and Metro as organizations. It also identifies Ruth Reichl as a person. It does so without consulting a knowledge base, like wikipedia, but instead uses trained statistical and grammatical parsers.

7. Let's practice!

Now it's your turn to practice some named entity recognition using nltk.