Get Started

RegEx with EntityRuler in spaCy

Regular expressions, or RegEx, are used for rule-based information extraction with complex string matching patterns. RegEx can be used to retrieve patterns or replace matching patterns in a string with some other patterns. In this exercise, you will practice using EntityRuler in spaCy to find email addresses in a given text.

spaCy package is already imported for your use. You can use \d to match string patterns representative of a metacharacter that matches any digit from 0 to 9.

A spaCy pattern can use REGEX as an attribute. In this case, a pattern will be of shape [{"TEXT": {"REGEX": "<a given pattern>"}}].

This is a part of the course

“Natural Language Processing with spaCy”

View Course

Exercise instructions

  • Define a pattern to match phone numbers of the form 8888888888 to be used by the EntityRuler.
  • Load a blank spaCy English model and add an EntityRuler component to the pipeline.
  • Add the compiled pattern to the EntityRuler component.
  • Run the model and print the tuple of text and type of entities for the given text.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

text = "Our phone number is 4251234567."

# Define a pattern to match phone numbers
patterns = [{"label": "PHONE_NUMBERS", "pattern": [{"TEXT": {"REGEX": "(____){____}"}}]}]

# Load a blank model and add an EntityRuler
nlp = spacy.____("en")
ruler = nlp.____("entity_ruler")

# Add the compiled patterns to the EntityRuler
ruler.____(patterns)

# Print the tuple of entities texts and types for the given text
doc = ____(____)
print([(ent.____, ent.____) for ent in doc.____])

This exercise is part of the course

Natural Language Processing with spaCy

IntermediateSkill Level
4.2+
5 reviews

Master the core operations of spaCy and train models for natural language processing. Extract information from unstructured data and match patterns.

Get familiar with spaCy pipeline components, how to add a pipeline component, and analyze the NLP pipeline. You will also learn about multiple approaches for rule-based information extraction using EntityRuler, Matcher, and PhraseMatcher classes in spaCy and RegEx Python package.

Exercise 1: spaCy pipelinesExercise 2: Adding pipes in spaCyExercise 3: Analyzing pipelines in spaCyExercise 4: spaCy EntityRulerExercise 5: EntityRuler with blank spaCy modelExercise 6: EntityRuler for NERExercise 7: EntityRuler with multi-patterns in spaCyExercise 8: RegEx with spaCyExercise 9: RegEx in PythonExercise 10: RegEx with EntityRuler in spaCy
Exercise 11: spaCy Matcher and PhraseMatcherExercise 12: Matching a single term in spaCyExercise 13: PhraseMatcher in spaCyExercise 14: Matching with extended syntax in spaCy

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free