Get startedGet started for free

PhraseMatcher in spaCy

While processing unstructured text, you often have long lists and dictionaries that you want to scan and match in given texts. The Matcher patterns are handcrafted and each token needs to be coded individually. If you have a long list of phrases, Matcher is no longer the best option. In this instance, PhraseMatcher class helps us match long dictionaries. In this exercise, you will practice to retrieve patterns with matching shapes to multiple terms using PhraseMatcher class.

en_core_web_sm model is already loaded and ready for you to use as nlp. PhraseMatcher class is imported. A text string and a list of terms are available for your use.

This exercise is part of the course

Natural Language Processing with spaCy

View Course

Exercise instructions

  • Initialize a PhraseMatcher class with an attr to match to shape of given terms.
  • Create patterns to add to the PhraseMatcher object.
  • Find matches to the given patterns and print start and end token indices and matching section of the given text.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

text = "There are only a few acceptable IP addresse: (1) 127.100.0.1, (2) 123.4.1.0."
terms = ["110.0.0.0", "101.243.0.0"]

# Initialize a PhraseMatcher class to match to shapes of given terms
matcher = ____(nlp.____, attr = ____)

# Create patterns to add to the PhraseMatcher object
patterns = [nlp.make_doc(____) for term in terms]
matcher.____("IPAddresses", patterns)

# Find matches to the given patterns and print start and end characters and matches texts
doc = ____
matches = ____
for match_id, start, end in matches:
    print("Start token: ", ____, " | End token: ", ____, "| Matched text: ", doc[____:____].text)
Edit and Run Code