Spanish NER with polyglot
You'll continue your exploration of polyglot
now with some Spanish annotation. This article is not written by a newspaper, so it is your first example of a more blog-like text. How do you think that might compare when finding entities?
The Text
object has been created as txt
, and each entity has been printed, as you can see in the IPython Shell.
Your specific task is to determine how many of the entities contain the words "Márquez"
or "Gabo"
- these refer to the same person in different ways!
This exercise is part of the course
Introduction to Natural Language Processing in Python
Exercise instructions
- Iterate over all of the entities of
txt
, usingent
as your iterator variable. - Check whether the entity contains
"Márquez"
or"Gabo"
. If it does, incrementcount
. Don't forget to include the accentedá
in"Márquez"
! - Hit 'Submit Answer' to see what percentage of entities refer to Gabriel García Márquez (aka Gabo).
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Initialize the count variable: count
count = 0
# Iterate over all the entities
____
# Check whether the entity contains 'Márquez' or 'Gabo'
____
# Increment count
____
# Print count
print(count)
# Calculate the percentage of entities that refer to "Gabo": percentage
percentage = count / len(txt.entities)
print(percentage)