Matching a single term in spaCy
RegEx patterns are not trivial to read, write and debug. But you are not at a loss, spaCy provides a readable and production-level alternative, the Matcher class. The Matcher class can match predefined rules to a sequence of tokens in a given Doc container. In this exercise, you will practice using Matcher
to find a single word.
You can access the corresponding text in example_text
and use nlp
and doc
to access an spaCy
model and Doc
container of example_text
respectively.
This exercise is part of the course
Natural Language Processing with spaCy
Exercise instructions
- Initialize a
Matcher
class. - Define a pattern to match lower cased
witch
in theexample_text
. - Add the patterns to the
Matcher
class and find matches. - Iterate through matches and print start and end token indices and span of the matched text.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
nlp = spacy.load("en_core_web_sm")
doc = nlp(example_text)
# Initialize a Matcher object
matcher = Matcher(nlp.____)
# Define a pattern to match lower cased word witch
pattern = [{"lower" : ____}]
# Add the pattern to matcher object and find matches
matcher.add("CustomMatcher", [____])
matches = matcher(____)
# Print start and end token indices and span of the matched text
for match_id, start, end in matches:
print("Start token: ", ____, " | End token: ", ____, "| Matched text: ", doc[____:____].text)