Get startedGet started for free

Training a spaCy model from scratch

spaCy provides a very clean and efficient approach to train your own models. In this exercise, you will train a NER model from scratch on a real-world corpus (CORD-19 data).

Training data is available in the right format as training_data. In this exercise, you will use a given list of labels ("Pathogen", "MedicalCondition", "Medicine") stored in labels using a blank English model (nlp) with an NER component. Intended medical labels will be added the NER pipeline and then you can train the model for one epoch. You can use pre-imported Example class to convert the training data to the required format. To track model training you can add a losses list to the .update() method and review training loss.

This exercise is part of the course

Natural Language Processing with spaCy

View Course

Exercise instructions

  • Create a blank spaCy model and add an NER component to the model.
  • Disable other pipeline components, use the created optimizer object and update the model weights using converted data to the Example format.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Load a blank English model, add NER component, add given labels to the ner pipeline
nlp = spacy.____("____")
ner = nlp.____("ner")
for ent in labels:
    ner.add_label(ent)

# Disable other pipeline components, complete training loop and run training loop
other_pipes = [____ for pipe in nlp.____ if ____ != "____"]
nlp.disable_pipes(*____)
losses = {}
optimizer = nlp.begin_training()
for text, annotation in training_data:
    doc = nlp.____(text)
    example = Example.____(doc, annotation)
    nlp.____([example], sgd=____, losses=losses)
    print(losses)
Edit and Run Code