LoslegenKostenlos loslegen

RegEx with EntityRuler in spaCy

Regular expressions, or RegEx, are used for rule-based information extraction with complex string matching patterns. RegEx can be used to retrieve patterns or replace matching patterns in a string with some other patterns. In this exercise, you will practice using EntityRuler in spaCy to find email addresses in a given text.

spaCy package is already imported for your use. You can use \d to match string patterns representative of a metacharacter that matches any digit from 0 to 9.

A spaCy pattern can use REGEX as an attribute. In this case, a pattern will be of shape [{"TEXT": {"REGEX": "<a given pattern>"}}].

Diese Übung ist Teil des Kurses

Natural Language Processing with spaCy

Kurs anzeigen

Anleitung zur Übung

  • Define a pattern to match phone numbers of the form 8888888888 to be used by the EntityRuler.
  • Load a blank spaCy English model and add an EntityRuler component to the pipeline.
  • Add the compiled pattern to the EntityRuler component.
  • Run the model and print the tuple of text and type of entities for the given text.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

text = "Our phone number is 4251234567."

# Define a pattern to match phone numbers
patterns = [{"label": "PHONE_NUMBERS", "pattern": [{"TEXT": {"REGEX": "(____){____}"}}]}]

# Load a blank model and add an EntityRuler
nlp = spacy.____("en")
ruler = nlp.____("entity_ruler")

# Add the compiled patterns to the EntityRuler
ruler.____(patterns)

# Print the tuple of entities texts and types for the given text
doc = ____(____)
print([(ent.____, ent.____) for ent in doc.____])
Code bearbeiten und ausführen