ComeçarComece de graça

Annotation and preparing training data

After collecting data, you can annotate data in the required format for a spaCy model. In this exercise, you will practice forming the correct annotated data record for an NER task in the medical domain.

A sentence and two entities of entity_1 with a text of chest pain and a SYMPTOM type and entity_2 with a text of hyperthyroidism and a DISEASE type are available for you to use.

Este exercício faz parte do curso

Natural Language Processing with spaCy

Ver curso

Instruções do exercício

  • Complete the annotated_data record in the correct format.
  • Extract start and end characters of each entity and store as the corresponding variables.
  • Store the same input sentence and its entities in the proper training format as training_data.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

text = "A patient with chest pain had hyperthyroidism."
entity_1 = "chest pain"
entity_2 = "hyperthyroidism"

# Store annotated data information in the correct format
annotated_data = {"sentence": ____, "entities": [{"label": "SYMPTOM", "value": ____}, {"label": "DISEASE", "value": ____}]}

# Extract start and end characters of each entity
entity_1_start_char = text.____(____)
entity_1_end_char = entity_1_start_char + len(____)
entity_2_start_char = text.____(____)
entity_2_end_char = entity_2_start_char + len(____)

# Store the same input information in the proper format for training
training_data = [(____, {"entities": [(____,____,"SYMPTOM"), 
                                      (____,____,"DISEASE")]})]
print(training_data)
Editar e executar o código