Matching with extended syntax in spaCy
Rule-based information extraction is essential for any NLP pipeline. The Matcher class allows patterns to be more expressive by allowing some operators inside the curly brackets. These operators are for extended comparison and look similar to Python's in, not in and comparison operators. In this exercise, you will practice with spaCy
's matching functionality, Matcher
, to find matches for given terms from an example text.
Matcher
class is already imported from spacy.matcher
library. You will use a Doc
container of an example text in this exercise by calling doc
. A pre-loaded spaCy
model is also accessible at nlp
.
This exercise is part of the course
Natural Language Processing with spaCy
Exercise instructions
- Define a matcher object using
Matcher
andnlp
. - Use the
IN
operator to define a pattern to matchtiny squares
andtiny mouthful
. - Use this pattern to find matches for
doc
. - Print start and end token indices and text span of the matches.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
nlp = spacy.load("en_core_web_sm")
doc = nlp(example_text)
# Define a matcher object
matcher = Matcher(nlp.____)
# Define a pattern to match tiny squares and tiny mouthful
pattern = [{"lower": ____}, {"lower": {____: ["squares", "mouthful"]}}]
# Add the pattern to matcher object and find matches
matcher.____("CustomMatcher", [____])
matches = ____(____)
# Print out start and end token indices and the matched text span per match
for match_id, start, end in matches:
print("Start token: ", ____, " | End token: ", ____, "| Matched text: ", doc[____:____].text)