Creating a custom named entity in spaCy
If spaCy
's built-in named entities aren't enough, you can make your own using spaCy
's EntityRuler()
class.
EntityRuler()
allows you to create your own entities to add to a spaCy
pipeline.
You start by creating an instance of EntityRuler()
and passing it the current pipeline, nlp
.
You can then call add_patterns()
on the instance and pass it a dictionary of the text pattern
you'd like to label with an entity.
Once you've setup a pattern you can add it the nlp
pipeline using add_pipe()
.
Since Acme is a technology company, you decide to tag the pattern "smartphone"
with the "PRODUCT"
entity tag.
spaCy
has been imported and a doc
already exists containing the transcribed text from call_4_channel_2.wav
file).
This is a part of the course
“Spoken Language Processing in Python”
Exercise instructions
- Import
EntityRuler
fromspacy.pipeline
. - Add
"smartphone"
as the value for the"pattern"
key. - Add the
EntityRuler()
instance,ruler
, to thenlp
pipeline. - Print the entity attributes contained in
doc
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import EntityRuler class
from spacy.pipeline import ____
# Create EntityRuler instance
ruler = EntityRuler(nlp)
# Define pattern for new entity
ruler.add_patterns([{"label": "PRODUCT", "pattern": ____}])
# Update existing pipeline
nlp.add_pipe(____, before="ner")
# Test new entity
for entity in doc.____:
print(entity.text, entity.label_)
This exercise is part of the course
Spoken Language Processing in Python
Learn how to load, transform, and transcribe speech from raw audio files in Python.
In this chapter, you'll put everything you've learned together by building a speech processing proof of concept project for a technology company, Acme Studios. You'll start by transcribing customer support call phone call audio snippets to text. Then you'll perform sentiment analysis using NLTK, named entity recognition using spaCy and text classification using scikit-learn on the transcribed text.
Exercise 1: Creating transcription helper functionsExercise 2: Converting audio to the right formatExercise 3: Finding PyDub statsExercise 4: Transcribing audio with one lineExercise 5: Using the helper functions you've builtExercise 6: Sentiment analysis on spoken language textExercise 7: Analyzing sentiment of a phone callExercise 8: Sentiment analysis on formatted textExercise 9: Named entity recognition on transcribed textExercise 10: Named entity recognition in spaCyExercise 11: Creating a custom named entity in spaCyExercise 12: Classifying transcribed speech with SklearnExercise 13: Preparing audio files for text classificationExercise 14: Transcribing phone call excerptsExercise 15: Organizing transcribed phone call dataExercise 16: Create a spoken language text classifierExercise 17: Congratulations!What is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.