Training a spaCy model from scratch

spaCy provides a very clean and efficient approach to train your own models. In this exercise, you will train a NER model from scratch on a real-world corpus (CORD-19 data).

Training data is available in the right format as training_data. In this exercise, you will use a given list of labels ("Pathogen", "MedicalCondition", "Medicine") stored in labels using a blank English model (nlp) with an NER component. Intended medical labels will be added the NER pipeline and then you can train the model for one epoch. You can use pre-imported Example class to convert the training data to the required format. To track model training you can add a losses list to the .update() method and review training loss.

Create a blank spaCy model and add an NER component to the model.
Disable other pipeline components, use the created optimizer object and update the model weights using converted data to the Example format.

Introduction to NLP and spaCy

spaCy Linguistic Annotations and Word Vectors

Data Analysis with spaCy

Customizing spaCy Models

Exercice

Training a spaCy model from scratch

Instructions