Get startedGet started for free

NER with NLTK

You're now going to have some fun with named-entity recognition! A scraped news article has been pre-loaded into your workspace. Your task is to use nltk to find the named entities in this article.

What might the article be about, given the names you found?

Along with nltk, sent_tokenize and word_tokenize from nltk.tokenize have been pre-imported.

This exercise is part of the course

Introduction to Natural Language Processing in Python

View Course

Exercise instructions

  • Tokenize article into sentences.
  • Tokenize each sentence in sentences into words using a list comprehension.
  • Inside a list comprehension, tag each tokenized sentence into parts of speech using nltk.pos_tag().
  • Chunk each tagged sentence into named-entity chunks using nltk.ne_chunk_sents(). Along with pos_sentences, specify the additional keyword argument binary=True.
  • Loop over each sentence and each chunk, and test whether it is a named-entity chunk by testing if it has the attribute label, and if the chunk.label() is equal to "NE". If so, print that chunk.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Tokenize the article into sentences: sentences
sentences = ____

# Tokenize each sentence into words: token_sentences
token_sentences = [____ for sent in ____]

# Tag each tokenized sentence into parts of speech: pos_sentences
pos_sentences = [____ for sent in ____] 

# Create the named entity chunks: chunked_sentences
chunked_sentences = ____

# Test for stems of the tree with 'NE' tags
for sent in chunked_sentences:
    for chunk in sent:
        if hasattr(chunk, "label") and ____ == "____":
            print(chunk)
Edit and Run Code