RegEx with EntityRuler in spaCy
Regular expressions, or RegEx, are used for rule-based information extraction with complex string matching patterns. RegEx can be used to retrieve patterns or replace matching patterns in a string with some other patterns. In this exercise, you will practice using EntityRuler in spaCy to find email addresses in a given text.
spaCy package is already imported for your use. You can use \d to match string patterns representative of a metacharacter that matches any digit from 0 to 9.
A spaCy pattern can use REGEX as an attribute. In this case, a pattern will be of shape [{"TEXT": {"REGEX": "<a given pattern>"}}].
Diese Übung ist Teil des Kurses
Natural Language Processing with spaCy
Anleitung zur Übung
- Define a pattern to match phone numbers of the form
8888888888to be used by theEntityRuler. - Load a blank
spaCyEnglish model and add anEntityRulercomponent to the pipeline. - Add the compiled pattern to the
EntityRulercomponent. - Run the model and print the tuple of text and type of entities for the given
text.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
text = "Our phone number is 4251234567."
# Define a pattern to match phone numbers
patterns = [{"label": "PHONE_NUMBERS", "pattern": [{"TEXT": {"REGEX": "(____){____}"}}]}]
# Load a blank model and add an EntityRuler
nlp = spacy.____("en")
ruler = nlp.____("entity_ruler")
# Add the compiled patterns to the EntityRuler
ruler.____(patterns)
# Print the tuple of entities texts and types for the given text
doc = ____(____)
print([(ent.____, ent.____) for ent in doc.____])