CommencerCommencer gratuitement

Training a spaCy model from scratch

spaCy provides a very clean and efficient approach to train your own models. In this exercise, you will train a NER model from scratch on a real-world corpus (CORD-19 data).

Training data is available in the right format as training_data. In this exercise, you will use a given list of labels ("Pathogen", "MedicalCondition", "Medicine") stored in labels using a blank English model (nlp) with an NER component. Intended medical labels will be added the NER pipeline and then you can train the model for one epoch. You can use pre-imported Example class to convert the training data to the required format. To track model training you can add a losses list to the .update() method and review training loss.

Cet exercice fait partie du cours

Natural Language Processing with spaCy

Afficher le cours

Instructions

  • Create a blank spaCy model and add an NER component to the model.
  • Disable other pipeline components, use the created optimizer object and update the model weights using converted data to the Example format.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Load a blank English model, add NER component, add given labels to the ner pipeline
nlp = spacy.____("____")
ner = nlp.____("ner")
for ent in labels:
    ner.add_label(ent)

# Disable other pipeline components, complete training loop and run training loop
other_pipes = [____ for pipe in nlp.____ if ____ != "____"]
nlp.disable_pipes(*____)
losses = {}
optimizer = nlp.begin_training()
for text, annotation in training_data:
    doc = nlp.____(text)
    example = Example.____(doc, annotation)
    nlp.____([example], sgd=____, losses=losses)
    print(losses)
Modifier et exécuter le code