Get Started

Creating a custom named entity in spaCy

If spaCy's built-in named entities aren't enough, you can make your own using spaCy's EntityRuler() class.

EntityRuler() allows you to create your own entities to add to a spaCy pipeline.

You start by creating an instance of EntityRuler() and passing it the current pipeline, nlp.

You can then call add_patterns() on the instance and pass it a dictionary of the text pattern you'd like to label with an entity.

Once you've setup a pattern you can add it the nlp pipeline using add_pipe().

Since Acme is a technology company, you decide to tag the pattern "smartphone" with the "PRODUCT" entity tag.

spaCy has been imported and a doc already exists containing the transcribed text from call_4_channel_2.wav file).

This is a part of the course

“Spoken Language Processing in Python”

View Course

Exercise instructions

  • Import EntityRuler from spacy.pipeline.
  • Add "smartphone" as the value for the "pattern" key.
  • Add the EntityRuler() instance, ruler, to the nlp pipeline.
  • Print the entity attributes contained in doc.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import EntityRuler class
from spacy.pipeline import ____

# Create EntityRuler instance
ruler = EntityRuler(nlp)

# Define pattern for new entity
ruler.add_patterns([{"label": "PRODUCT", "pattern": ____}])

# Update existing pipeline
nlp.add_pipe(____, before="ner")

# Test new entity
for entity in doc.____:
  print(entity.text, entity.label_)

This exercise is part of the course

Spoken Language Processing in Python

AdvancedSkill Level
4.7+
3 reviews

Learn how to load, transform, and transcribe speech from raw audio files in Python.

In this chapter, you'll put everything you've learned together by building a speech processing proof of concept project for a technology company, Acme Studios. You'll start by transcribing customer support call phone call audio snippets to text. Then you'll perform sentiment analysis using NLTK, named entity recognition using spaCy and text classification using scikit-learn on the transcribed text.

Exercise 1: Creating transcription helper functionsExercise 2: Converting audio to the right formatExercise 3: Finding PyDub statsExercise 4: Transcribing audio with one lineExercise 5: Using the helper functions you've builtExercise 6: Sentiment analysis on spoken language textExercise 7: Analyzing sentiment of a phone callExercise 8: Sentiment analysis on formatted textExercise 9: Named entity recognition on transcribed textExercise 10: Named entity recognition in spaCyExercise 11: Creating a custom named entity in spaCy
Exercise 12: Classifying transcribed speech with SklearnExercise 13: Preparing audio files for text classificationExercise 14: Transcribing phone call excerptsExercise 15: Organizing transcribed phone call dataExercise 16: Create a spoken language text classifierExercise 17: Congratulations!

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free