ComeçarComece de graça

Training a spaCy model from scratch

spaCy provides a very clean and efficient approach to train your own models. In this exercise, you will train a NER model from scratch on a real-world corpus (CORD-19 data).

Training data is available in the right format as training_data. In this exercise, you will use a given list of labels ("Pathogen", "MedicalCondition", "Medicine") stored in labels using a blank English model (nlp) with an NER component. Intended medical labels will be added the NER pipeline and then you can train the model for one epoch. You can use pre-imported Example class to convert the training data to the required format. To track model training you can add a losses list to the .update() method and review training loss.

Este exercício faz parte do curso

Natural Language Processing with spaCy

Ver curso

Instruções do exercício

  • Create a blank spaCy model and add an NER component to the model.
  • Disable other pipeline components, use the created optimizer object and update the model weights using converted data to the Example format.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Load a blank English model, add NER component, add given labels to the ner pipeline
nlp = spacy.____("____")
ner = nlp.____("ner")
for ent in labels:
    ner.add_label(ent)

# Disable other pipeline components, complete training loop and run training loop
other_pipes = [____ for pipe in nlp.____ if ____ != "____"]
nlp.disable_pipes(*____)
losses = {}
optimizer = nlp.begin_training()
for text, annotation in training_data:
    doc = nlp.____(text)
    example = Example.____(doc, annotation)
    nlp.____([example], sgd=____, losses=losses)
    print(losses)
Editar e executar o código