LoslegenKostenlos loslegen

Matching a single term in spaCy

RegEx patterns are not trivial to read, write and debug. But you are not at a loss, spaCy provides a readable and production-level alternative, the Matcher class. The Matcher class can match predefined rules to a sequence of tokens in a given Doc container. In this exercise, you will practice using Matcher to find a single word.

You can access the corresponding text in example_text and use nlp and doc to access an spaCy model and Doc container of example_text respectively.

Diese Übung ist Teil des Kurses

Natural Language Processing with spaCy

Kurs anzeigen

Anleitung zur Übung

  • Initialize a Matcher class.
  • Define a pattern to match lower cased witch in the example_text.
  • Add the patterns to the Matcher class and find matches.
  • Iterate through matches and print start and end token indices and span of the matched text.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

nlp = spacy.load("en_core_web_sm")
doc = nlp(example_text)

# Initialize a Matcher object
matcher = Matcher(nlp.____)

# Define a pattern to match lower cased word witch
pattern = [{"lower" : ____}]

# Add the pattern to matcher object and find matches
matcher.add("CustomMatcher", [____])
matches = matcher(____)

# Print start and end token indices and span of the matched text
for match_id, start, end in matches:
    print("Start token: ", ____, " | End token: ", ____, "| Matched text: ", doc[____:____].text)
Code bearbeiten und ausführen