Wrapping up

1. Wrapping up

Congratulations – you've made it to the end of the course!

2. Your new spaCy skills

Here's an overview of all the new skills you learned so far: In the first chapter, you learned how to extract linguistic features like part-of-speech tags, syntactic dependencies and named entities, and how to work with pre-trained statistical models. You also learned to write powerful match patterns to extract words and phrases using spaCy's matcher and phrase matcher. Chapter 2 was all about information extraction, and you learned how to work with the data structures, the Doc, Token and Span, as well as the vocab and lexical entries. You also used spaCy to predict semantic similarities using word vectors. In chapter 3, you got some more insights into spaCy's pipeline, and learned to write your own custom pipeline components that modify the Doc. You also created your own custom extension attributes for Docs, Tokens and Spans, and learned about processing streams and making your pipeline faster. Finally, in chapter 4, you learned about training and updating spaCy's statistical models, specifically the entity recognizer. You learned some useful tricks for how to create training data, and how to design your label scheme to get the best results.

3. More things to do with spaCy (1)

Of course, there's a lot more that spaCy can do that we didn't get to cover in this course. While we focused mostly on training the entity recognizer, you can also train and update the other statistical pipeline components like the part-of-speech tagger and dependency parser. Another useful pipeline component is the text classifier, which can learn to predict labels that apply to the whole text. It's not part of the pre-trained models, but you can add it to an existing model and train it on your own data.

4. More things to do with spaCy (2)

In this course, we basically accepted the default tokenization as it is. But you don't have to! spaCy lets you customize the rules used to determine where and how to split the text. You can also add and improve the support for other languages. While spaCy already supports tokenization for many different languages, there's still a lot of room for improvement. Supporting tokenization for a new language is the first step towards being able to train a statistical model.

5. See the website for more info and documentation!

For more examples, tutorials and in-depth API documentation, check out the spaCy website.

6. Thanks and see you soon!

Thanks so much for taking this course! I hope you had fun, and I'm excited to hear about the cool things you end up building with spaCy.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.