NER with NLTK
You're now going to have some fun with named-entity recognition! A scraped news article has been pre-loaded into your workspace. Your task is to use nltk
to find the named entities in this article.
What might the article be about, given the names you found?
Along with nltk
, sent_tokenize
and word_tokenize
from nltk.tokenize
have been pre-imported.
This exercise is part of the course
Introduction to Natural Language Processing in Python
Exercise instructions
- Tokenize
article
into sentences. - Tokenize each sentence in
sentences
into words using a list comprehension. - Inside a list comprehension, tag each tokenized sentence into parts of speech using
nltk.pos_tag()
. - Chunk each tagged sentence into named-entity chunks using
nltk.ne_chunk_sents()
. Along withpos_sentences
, specify the additional keyword argumentbinary=True
. - Loop over each sentence and each chunk, and test whether it is a named-entity chunk by testing if it has the attribute
label
, and if thechunk.label()
is equal to"NE"
. If so, print that chunk.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Tokenize the article into sentences: sentences
sentences = ____
# Tokenize each sentence into words: token_sentences
token_sentences = [____ for sent in ____]
# Tag each tokenized sentence into parts of speech: pos_sentences
pos_sentences = [____ for sent in ____]
# Create the named entity chunks: chunked_sentences
chunked_sentences = ____
# Test for stems of the tree with 'NE' tags
for sent in chunked_sentences:
for chunk in sent:
if hasattr(chunk, "label") and ____ == "____":
print(chunk)