Training a spaCy model from scratch
spaCy provides a very clean and efficient approach to train your own models. In this exercise, you will train a NER model from scratch on a real-world corpus (CORD-19 data).
Training data is available in the right format as training_data
. In this exercise, you will use a given list of labels ("Pathogen", "MedicalCondition", "Medicine") stored in labels
using a blank English model (nlp
) with an NER component. Intended medical labels
will be added the NER pipeline and then you can train the model for one epoch. You can use pre-imported Example
class to convert the training data to the required format. To track model training you can add a losses
list to the .update()
method and review training loss.
This exercise is part of the course
Natural Language Processing with spaCy
Exercise instructions
- Create a blank spaCy model and add an NER component to the model.
- Disable other pipeline components, use the created
optimizer
object and update the model weights using converted data to theExample
format.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load a blank English model, add NER component, add given labels to the ner pipeline
nlp = spacy.____("____")
ner = nlp.____("ner")
for ent in labels:
ner.add_label(ent)
# Disable other pipeline components, complete training loop and run training loop
other_pipes = [____ for pipe in nlp.____ if ____ != "____"]
nlp.disable_pipes(*____)
losses = {}
optimizer = nlp.begin_training()
for text, annotation in training_data:
doc = nlp.____(text)
example = Example.____(doc, annotation)
nlp.____([example], sgd=____, losses=losses)
print(losses)