LoslegenKostenlos loslegen

Span similarity with spaCy

Determining semantic similarity can help you to categorize texts into predefined categories or detect relevant texts, or to flag duplicate content. In this exercise, you will practice calculating the semantic similarities of spans of a document to a given document. The goal is to find the most relevant Span of three tokens that are relevant to canned dog food.

The given category of canned dog food is stored at category. A text string is already stored in the text object and the en_core_web_md is loaded as nlp. The Doc container of the text is also already created and stored at document.

Diese Übung ist Teil des Kurses

Natural Language Processing with spaCy

Kurs anzeigen

Anleitung zur Übung

  • Create a Doc container for the category and store at category_document.
  • Print similarity score of a given Span and the category_document, rounded to three digits.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Create a Doc container for the category
category = "canned dog food"
category_document = nlp(____)

# Print similarity score of a given Span and category_document
document_span = document[0:3]
print(f"Semantic similarity with", document_span.text, ":", round(document_span.____(____), 3))
Code bearbeiten und ausführen