PhraseMatcher in spaCy

While processing unstructured text, you often have long lists and dictionaries that you want to scan and match in given texts. The Matcher patterns are handcrafted and each token needs to be coded individually. If you have a long list of phrases, Matcher is no longer the best option. In this instance, PhraseMatcher class helps us match long dictionaries. In this exercise, you will practice to retrieve patterns with matching shapes to multiple terms using PhraseMatcher class.

en_core_web_sm model is already loaded and ready for you to use as nlp. PhraseMatcher class is imported. A text string and a list of terms are available for your use.

Initialize a PhraseMatcher class with an attr to match to shape of given terms.
Create patterns to add to the PhraseMatcher object.
Find matches to the given patterns and print start and end token indices and matching section of the given text.

Introduction to NLP and spaCy

spaCy Linguistic Annotations and Word Vectors

Data Analysis with spaCy

Customizing spaCy Models

Ubung

PhraseMatcher in spaCy

Anweisungen