RegEx with EntityRuler in spaCy
Regular expressions, or RegEx, are used for rule-based information extraction with complex string matching patterns. RegEx can be used to retrieve patterns or replace matching patterns in a string with some other patterns. In this exercise, you will practice using EntityRuler
in spaCy
to find email addresses in a given text
.
spaCy
package is already imported for your use. You can use \d
to match string patterns representative of a metacharacter that matches any digit from 0 to 9.
A spaCy
pattern can use REGEX
as an attribute. In this case, a pattern will be of shape [{"TEXT": {"REGEX": "<a given pattern>"}}]
.
This is a part of the course
“Natural Language Processing with spaCy”
Exercise instructions
- Define a pattern to match phone numbers of the form
8888888888
to be used by theEntityRuler
. - Load a blank
spaCy
English model and add anEntityRuler
component to the pipeline. - Add the compiled pattern to the
EntityRuler
component. - Run the model and print the tuple of text and type of entities for the given
text
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
text = "Our phone number is 4251234567."
# Define a pattern to match phone numbers
patterns = [{"label": "PHONE_NUMBERS", "pattern": [{"TEXT": {"REGEX": "(____){____}"}}]}]
# Load a blank model and add an EntityRuler
nlp = spacy.____("en")
ruler = nlp.____("entity_ruler")
# Add the compiled patterns to the EntityRuler
ruler.____(patterns)
# Print the tuple of entities texts and types for the given text
doc = ____(____)
print([(ent.____, ent.____) for ent in doc.____])
This exercise is part of the course
Natural Language Processing with spaCy
Master the core operations of spaCy and train models for natural language processing. Extract information from unstructured data and match patterns.
Get familiar with spaCy pipeline components, how to add a pipeline component, and analyze the NLP pipeline. You will also learn about multiple approaches for rule-based information extraction using EntityRuler, Matcher, and PhraseMatcher classes in spaCy and RegEx Python package.
Exercise 1: spaCy pipelinesExercise 2: Adding pipes in spaCyExercise 3: Analyzing pipelines in spaCyExercise 4: spaCy EntityRulerExercise 5: EntityRuler with blank spaCy modelExercise 6: EntityRuler for NERExercise 7: EntityRuler with multi-patterns in spaCyExercise 8: RegEx with spaCyExercise 9: RegEx in PythonExercise 10: RegEx with EntityRuler in spaCyExercise 11: spaCy Matcher and PhraseMatcherExercise 12: Matching a single term in spaCyExercise 13: PhraseMatcher in spaCyExercise 14: Matching with extended syntax in spaCyWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.