1. Introduction to SpaCy
In this video, we'll take a look at SpaCy, another great library for natural language processing.
2. What is SpaCy?
SpaCy is a NLP library similar to Gensim, but with different implementations, including a particular focus on creating NLP pipelines to generate models and corpora. SpaCy is open-source and has several extra libraries and tools built by the same team, including Displacy - a visualization tool for viewing parse trees which uses Node-js to create interactive text.
3. Displacy entity recognition visualizer
For example, if we use the displacy entity recognition visualizer which has a live demo online, we can enter the sentence used in the last video. Here, we can see the SpaCy has identified three named entities and tagged them with the appropriate entity label -- such as location or person. SpaCy also has tools to build word and document vectors from text.
4. SpaCy NER
To start using spacy for Named entity recognition, we must first install it and download all the appropriate pre-trained word vectors. You can also train vectors yourself and load them; but the pretrained ones let us get started immediately.
We can load those into an object, NLP, which functions similarly to our Gensim dictionary and corpus. It has several linked objects, including entity which is an Entity Recognizer object from the pipeline module. This is what is used to find entities in the text.
Then we load a new document by passing a string into the NLP variable. When the document is loaded, the named entities are stored as a document attribute called ents. We see Spacy properly tagged and identified the three main entities in the sentence.
We can also investigate the labels of each entity by using indexing to pick out the first entity and the label_ attribute to see the label for that particular entity. Here we see the label for Berlin is GPE or Geopolitical entity.
Spacy has several other language models available, including advanced German and Chinese implementations. It's a great tool especially if you want to build your own extraction and natural language processing pipeline quickly and iteratively.
5. Why use SpaCy for NER?
Why use Spacy for NER? Outside of being able to integrate with the other great Spacy features like easy pipeline creation, it has a different set of entity types and often labels entities differently than nltk.
In addition, Spacy comes with informal language corpora, allowing you to more easily find entities in documents like Tweets and chat messages. It's a quickly growing library, so it might even have more languages supported by the time you are watching this video!
6. Let's practice!
For now, however, you can get started using Spacy for named entity recognition!