Get Started

RegEx with EntityRuler in spaCy

Regular expressions, or RegEx, are used for rule-based information extraction with complex string matching patterns. RegEx can be used to retrieve patterns or replace matching patterns in a string with some other patterns. In this exercise, you will practice using EntityRuler in spaCy to find email addresses in a given text.

spaCy package is already imported for your use. You can use \d to match string patterns representative of a metacharacter that matches any digit from 0 to 9.

A spaCy pattern can use REGEX as an attribute. In this case, a pattern will be of shape [{"TEXT": {"REGEX": "<a given pattern>"}}].

This is a part of the course

“Natural Language Processing with spaCy”

View Course

Exercise instructions

  • Define a pattern to match phone numbers of the form 8888888888 to be used by the EntityRuler.
  • Load a blank spaCy English model and add an EntityRuler component to the pipeline.
  • Add the compiled pattern to the EntityRuler component.
  • Run the model and print the tuple of text and type of entities for the given text.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

text = "Our phone number is 4251234567."

# Define a pattern to match phone numbers
patterns = [{"label": "PHONE_NUMBERS", "pattern": [{"TEXT": {"REGEX": "(____){____}"}}]}]

# Load a blank model and add an EntityRuler
nlp = spacy.____("en")
ruler = nlp.____("entity_ruler")

# Add the compiled patterns to the EntityRuler
ruler.____(patterns)

# Print the tuple of entities texts and types for the given text
doc = ____(____)
print([(ent.____, ent.____) for ent in doc.____])
Edit and Run Code