Efficient phrase matching

Sometimes it's more efficient to match exact strings instead of writing patterns describing the individual tokens. This is especially true for finite categories of things – like all countries of the world.

We already have a list of countries, so let's use this as the basis of our information extraction script. A list of string names is available as the variable COUNTRIES. The nlp object and a test doc have already been created and the doc.text has been printed to the shell.

Latihan ini adalah bagian dari kursus

Advanced NLP with spaCy

Lihat Kursus

Petunjuk latihan

Import the PhraseMatcher and initialize it with the shared vocab as the variable matcher.
Add the phrase patterns and call the matcher on the doc.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

# Import the PhraseMatcher and initialize it
from spacy.____ import ____
matcher = ____(____)

# Create pattern Doc objects and add them to the matcher
# This is the faster version of: [nlp(country) for country in COUNTRIES]
patterns = list(nlp.pipe(COUNTRIES))
matcher.add('COUNTRY', None, *patterns)

# Call the matcher on the test document and print the result
matches = ____(____)
print([doc[start:end] for match_id, start, end in matches])

Edit dan Jalankan Kode