Aan de slagGa gratis aan de slag

RegEx with EntityRuler in spaCy

Regular expressions, or RegEx, are used for rule-based information extraction with complex string matching patterns. RegEx can be used to retrieve patterns or replace matching patterns in a string with some other patterns. In this exercise, you will practice using EntityRuler in spaCy to find email addresses in a given text.

spaCy package is already imported for your use. You can use \d to match string patterns representative of a metacharacter that matches any digit from 0 to 9.

A spaCy pattern can use REGEX as an attribute. In this case, a pattern will be of shape [{"TEXT": {"REGEX": "<a given pattern>"}}].

Deze oefening maakt deel uit van de cursus

Natural Language Processing with spaCy

Cursus bekijken

Oefeninstructies

  • Define a pattern to match phone numbers of the form 8888888888 to be used by the EntityRuler.
  • Load a blank spaCy English model and add an EntityRuler component to the pipeline.
  • Add the compiled pattern to the EntityRuler component.
  • Run the model and print the tuple of text and type of entities for the given text.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

text = "Our phone number is 4251234567."

# Define a pattern to match phone numbers
patterns = [{"label": "PHONE_NUMBERS", "pattern": [{"TEXT": {"REGEX": "(____){____}"}}]}]

# Load a blank model and add an EntityRuler
nlp = spacy.____("en")
ruler = nlp.____("entity_ruler")

# Add the compiled patterns to the EntityRuler
ruler.____(patterns)

# Print the tuple of entities texts and types for the given text
doc = ____(____)
print([(ent.____, ent.____) for ent in doc.____])
Code bewerken en uitvoeren