1. Learn
  2. /
  3. Courses
  4. /
  5. Natural Language Processing with spaCy

Connected

Exercise

RegEx with EntityRuler in spaCy

Regular expressions, or RegEx, are used for rule-based information extraction with complex string matching patterns. RegEx can be used to retrieve patterns or replace matching patterns in a string with some other patterns. In this exercise, you will practice using EntityRuler in spaCy to find email addresses in a given text.

spaCy package is already imported for your use. You can use \d to match string patterns representative of a metacharacter that matches any digit from 0 to 9.

A spaCy pattern can use REGEX as an attribute. In this case, a pattern will be of shape [{"TEXT": {"REGEX": "<a given pattern>"}}].

Instructions

100 XP
  • Define a pattern to match phone numbers of the form 8888888888 to be used by the EntityRuler.
  • Load a blank spaCy English model and add an EntityRuler component to the pipeline.
  • Add the compiled pattern to the EntityRuler component.
  • Run the model and print the tuple of text and type of entities for the given text.