Span similarity with spaCy
Determining semantic similarity can help you to categorize texts into predefined categories or detect relevant texts, or to flag duplicate content. In this exercise, you will practice calculating the semantic similarities of spans of a document to a given document. The goal is to find the most relevant Span
of three tokens that are relevant to canned dog food.
The given category of canned dog food is stored at category
. A text string is already stored in the text
object and the en_core_web_md
is loaded as nlp
. The Doc
container of the text
is also already created and stored at document
.
This exercise is part of the course
Natural Language Processing with spaCy
Exercise instructions
- Create a
Doc
container for thecategory
and store atcategory_document
. - Print similarity score of a given
Span
and thecategory_document
, rounded to three digits.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a Doc container for the category
category = "canned dog food"
category_document = nlp(____)
# Print similarity score of a given Span and category_document
document_span = document[0:3]
print(f"Semantic similarity with", document_span.text, ":", round(document_span.____(____), 3))