Get startedGet started for free

Training multiple labels

Here's a small sample of a dataset created to train a new entity type WEBSITE. The original dataset contains a few thousand sentences. In this exercise, you'll be doing the labeling by hand. In real life, you probably want to automate this and use an annotation tool – for example, Brat, a popular open-source solution, or Prodigy, our own annotation tool that integrates with spaCy.

After this exercise you will be nearly done with the course! If you enjoyed it, feel free to send Ines a thank you via Twitter - she'll appreciate it! Tweet to Ines

This exercise is part of the course

Advanced NLP with spaCy

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

TRAINING_DATA = [
    ("Reddit partners with Patreon to help creators build communities", 
     {'entities': [(____, ____, 'WEBSITE'), (____, ____, 'WEBSITE')]}),
  
    ("PewDiePie smashes YouTube record", 
     {'entities': [(____, ____, 'WEBSITE')]}),
  
    ("Reddit founder Alexis Ohanian gave away two Metallica tickets to fans", 
     {'entities': [(____, ___, 'WEBSITE')]}),
    # And so on...
]
Edit and Run Code