RegEx with EntityRuler in spaCy

Regular expressions, or RegEx, are used for rule-based information extraction with complex string matching patterns. RegEx can be used to retrieve patterns or replace matching patterns in a string with some other patterns. In this exercise, you will practice using EntityRuler in spaCy to find email addresses in a given text.

spaCy package is already imported for your use. You can use \d to match string patterns representative of a metacharacter that matches any digit from 0 to 9.

A spaCy pattern can use REGEX as an attribute. In this case, a pattern will be of shape [{"TEXT": {"REGEX": "<a given pattern>"}}].

Diese Übung ist Teil des Kurses

Natural Language Processing with spaCy

Anleitung zur Übung

Define a pattern to match phone numbers of the form 8888888888 to be used by the EntityRuler.
Load a blank spaCy English model and add an EntityRuler component to the pipeline.
Add the compiled pattern to the EntityRuler component.
Run the model and print the tuple of text and type of entities for the given text.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

text = "Our phone number is 4251234567."

# Define a pattern to match phone numbers
patterns = [{"label": "PHONE_NUMBERS", "pattern": [{"TEXT": {"REGEX": "(____){____}"}}]}]

# Load a blank model and add an EntityRuler
nlp = spacy.____("en")
ruler = nlp.____("entity_ruler")

# Add the compiled patterns to the EntityRuler
ruler.____(patterns)

# Print the tuple of entities texts and types for the given text
doc = ____(____)
print([(ent.____, ent.____) for ent in doc.____])

Code bearbeiten und ausführen

Diese Übung ist Teil des Kurses

Natural Language Processing with spaCy

Mittlere SchwierigkeitSchwierigkeitsgrad

4.8+

Kurs kostenlos starten

This chapter will introduce you to NLP, some of its use cases such as named-entity recognition and AI-powered chatbots. You’ll learn how to use the powerful spaCy library to perform various natural language processing tasks such as tokenization, sentence segmentation, POS tagging, and named entity recognition.

Exercise 1: Natural Language Processing (NLP) basics Exercise 2: Doc container in spaCy Exercise 3: NER use case Exercise 4: Tokenization with spaCy Exercise 5: spaCy basics Exercise 6: Running a spaCy pipeline Exercise 7: Lemmatization with spaCy Exercise 8: Sentence segmentation with spaCy Exercise 9: Linguistic features in spaCy Exercise 10: POS tagging with spaCy Exercise 11: NER with spaCy Exercise 12: Text processing with spaCy

Learn about linguistic features, word vectors, semantic similarity, analogies, and word vector operations. In this chapter you’ll discover how to use spaCy to extract word vectors, categorize texts that are relevant to a given topic and find semantically similar terms to given words from a corpus or from a spaCy model vocabulary.

Exercise 1: Linguistic features Exercise 2: Linguistic annotations in spaCy Exercise 3: Word-sense disambiguation with spaCy Exercise 4: Dependency parsing with spaCy Exercise 5: Introduction to word vectors Exercise 6: spaCy vocabulary Exercise 7: Word vectors in spaCy vocabulary Exercise 8: Word vectors and spaCy Exercise 9: Analogies and vector operations Exercise 10: Word vectors projection Exercise 11: Similar words in a vocabulary Exercise 12: Measuring semantic similarity with spaCy Exercise 13: Doc similarity with spaCy Exercise 14: Span similarity with spaCy Exercise 15: Semantic similarity for categorizing text

Get familiar with spaCy pipeline components, how to add a pipeline component, and analyze the NLP pipeline. You will also learn about multiple approaches for rule-based information extraction using EntityRuler, Matcher, and PhraseMatcher classes in spaCy and RegEx Python package.

Exercise 1: spaCy pipelines Exercise 2: Adding pipes in spaCy Exercise 3: Analyzing pipelines in spaCy Exercise 4: spaCy EntityRuler Exercise 5: EntityRuler with blank spaCy model Exercise 6: EntityRuler for NER Exercise 7: EntityRuler with multi-patterns in spaCy Exercise 8: RegEx with spaCy Exercise 9: RegEx in Python Exercise 10: RegEx with EntityRuler in spaCy

Aktuelle Übung

Exercise 11: spaCy Matcher and PhraseMatcher Exercise 12: Matching a single term in spaCy Exercise 13: PhraseMatcher in spaCy Exercise 14: Matching with extended syntax in spaCy

Explore multiple real-world use cases where spaCy models may fail and learn how to train them further to improve model performance. You’ll be introduced to spaCy training steps and understand how to train an existing spaCy model or from scratch, and evaluate the model at the inference time.

Exercise 1: Customizing spaCy models Exercise 2: Training spaCy models Exercise 3: Model performance on your data Exercise 4: spaCy training data format Exercise 5: Training steps Exercise 6: Annotation and preparing training data Exercise 7: Compatible training data Exercise 8: Training with spaCy Exercise 9: Training preparation steps Exercise 10: Train an existing NER model Exercise 11: Training a spaCy model from scratch Exercise 12: Wrap-up