Linguistic features

1. Linguistic features

Welcome! In this video, we will cover more details on POS tagging and introduce dependency parsing.

2. POS tagging

We have learned how we can use spaCy to extract part-of-speech tags. Each word is tagged by a POS tag depending on its context and the other surrounding words, and their POS tags. For example, given a tricky sentence such as "My cat will fish for a fish tomorrow in a fishy way.", the spaCy tagger makes correct POS tag predictions for the fish and fishy words by tagging the first word fish to VERB, the second word fish to NOUN and the word fishy to ADJ.

3. What is the importance of POS?

Now, the question we might ask is what is the importance of POS tags? Many applications need to know the word type for better accuracy. For example, in translation systems, the word fish as a verb and as a noun will map to different words in Spanish.

4. What is the importance of POS?

Syntactic information such as POS tags can help many tasks further down the pipeline such as word-sense disambiguation (WSD). WSD is a classical problem of deciding in which sense a word is used in a sentence. Determining the sense of the word can be crucial in search engines, machine translation, and question-answering systems. For example, for the word "Play", the POS tagger can help with WSD when the tagger labels the senses of word with a NOUN or VERB depending on its context.

5. Word-sense disambiguation

Let's use POS tagging for WSD. We create a tuple of the token and the dot-pos_ tag by looping over each token in the Doc container and check if "fish" is in the tokenized text. The word fish, in "I will fish tomorrow", has a -dot-pos_ tag of a VERB, which identifies its sense correctly as "to catch fish". In the sentence "I ate fish", the word fish has a -dot-pos_ tag of NOUN and it identifies the sense as "an animal".

6. Dependency parsing

We have learned POS tags, which are grammatical categories of words. POS tags do not reveal any relation between distant words in a given sentence. This is where dependency parsing comes in. This process provides a structured way of exploring the sentence syntax. It is analyzing sentence structure via dependencies between tokens. A dependency or a dependency relation is a directed link between two tokens. The result of this procedure is always a tree. For example, for the sentence "We understand the differences.", spaCy assigns a dependency label for each token such as "nsubj", "dobj" and "det". For example, the first arc with nsubj label shows the subject and verb relationship between "we" and "understand".

7. Dependency parsing and spaCy

A dependency label describes the type of syntactic relation between two tokens. A few of the most common dependency labels is provided in the table such as nsubj (Nominal subject), root, det (determiner), dobj (direct object) and aux (auxiliary).

8. Dependency parsing and displaCy

Let's draw our first dependency tree using displaCy. We can use spacy-dot-displacy-dot-serve by passing two arguments, a Doc container of a given text and a word of "dep" (dependency) to display a dependency tree. In a dependency relation, one of the tokens is the parent, and the other is its dependent. For example, for the dependency relation between the words "the" and "differences", "the" is the dependent, and the dependency label is "det", which stands for determiner.

9. Dependency parsing and spaCy

We use -dot-text and -dot-dep_ attributes of a token to access the dependency label of each of the tokens. We can also use spacy-dot-explain() method to view definition of each dependency label.

10. Let's practice!

Great job! Let's practice.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.