1. Additional NLP analysis
Welcome to the last chapter for introduction to natural language processing in R. Well done on making it this far. There is no code in this chapter, only an introduction to some really amazing text analysis techniques that we did not have time for in this course.
2. BERT, and ERNIE.
Bert,
which stands for bidirectional encoder representations from transformers, was released by Google in 2017. It is a mouthful to say,
but BERT is simply a pre-trained model that we can use with transfer learning to complete natural language processing tasks.
Bert is pre-trained, meaning that it has already created a language representation by training on a ton of unlabeled data. This all happens before we use this technique.
When we go to use Bert, it only takes a small amount of labeled data, to train BERT for a specific supervised learning task.
Bert is great at creating features that can be used in NLP models.
Similarly, other knowledge integration language representations, such as Ernie, are also coming out and improving NLP analysis.
3. Named Entity Recognition
Named entity recognition, which is not a new method,
seeks to classify named entities within text.
Examples of named entities are names, locations, organizations, and sometimes values, percentages, and even codes.
So how to we use it? Named entity recognition can be used for tasks such as
extracting entities from tweets that mention a company name
aiding recommendation engines by recommending similar content or products
and creating efficient search algorithms, as documents or text can be tagged with the entities that are found within them.
4. Part-of-speech tagging
Even older than named entity recognition, is part-of-speech tagging.
Part-of-speech tagging is a simple process that involves tagging each word with the correct part-of-speech.
Each word is labeled as a noun, verb, adjective, or other type of speech. Part-of-speech tagging is used across the NLP landscape. It is used to
aid in sentiment analysis, as the more you know about a sentence - the better you can understand the sentiment,
it is also used to create features for NLP models because it can
enhance what a model knows about the words being used.
If you are interested, the feature engineering course for NLP in python goes into more detail about part-of-speech tagging.
5. Let's recap.
These are just a few of the many additional text analysis methods available right now. And honestly each topic could have its own DataCamp course devoted to it. Let's recap these additional techniques, and the rest of the course by reviewing the methods we have discussed.