Transfer learning for language models

1. Transfer learning for language models

In the last chapter, you were introduced to transfer learning for language models using GloVE. Now, let's have a closer look on the different available word embeddings that can be used for transfer learning. Depending on the problem at hand, different ways to treat text can improve the model's quality.

2. The idea behind transfer learning

Transfer learning became popular initially for computer vision tasks. Later, it was also applied to language models. Transfer learning provides a model with a better initialization values. Meaning that instead of initializing the weights all equal to zero or with random numbers, we use the values obtained previously in a similar task. In other words, we use knowledge already available. Some research institutes and companies started to use their computer power availability to train models on very big datasets, and share the obtained weights with the community. GloVE is an example of this: we were able to use the weights of the glove vectors trained on the whole Wikipedia data, task that would be impossible with limited computer power. This allows many more researchers to improve the current knowledge in many fields by using those shared weights as a starting point, and evolving from there. Researchers with limited computer power could achieve state of the art results because of this open sourcing of the models.

3. Available architectures

There are a few alternatives to GloVE models. Word2Vec was created in 2013 by Google and contains two approaches to model a corpus: The continuous bag of words model and the skip gram model. Continuous bag of words uses neighboring or context words to predict the center word. While the skip gram model does the opposite which is to utilize the center word to predict its context words. FastText was created by Facebook in 2016 as an improvement of the Word2Vec model. It uses a word and n-grams of its chars to train the model. Finally, ELMo was created by Allen Institute in 2018 and achieved state of the art in many NLP tasks. It uses words and bidirectional layers to train the language model. Word2Vec and FastText are available in the package gensim, while Elmo is available on tensorflow_hub. ELMo is out of the scope of this course.

4. Example using Word2Vec

To use word2vec in python, import the word2vec class from gensim dot models. To train a model we initialize the subclass Word2Vec passing a corpus. Other parameters include size which is the dimension of the embedding vector to use, window which is the number of neighbor words to use as context and iter which is the number of epochs to train the model. We can use the model, for example, to find similar words in the corpus. For that, we access the word vectors attribute wv and the method most_similar, passing a list of words to find similarities and the number of similar words to retrieve.

5. Example using FastText

Same as word2vec, fasttext is also implemented in gensim and thus share many attributes and methods. We first import the class fasttext from gensim. Then to train a model, we have three steps. First we instantiate the model with basic parameters like the size of the embedding vector and the window containing the number of neighbor words to use as context. Then we build the vocabulary based on the corpus. Finally we train the model by passing the corpus, the total number of documents or sentences in the corpus and the number of epochs to train. The same methods such as finding similar words are also available on the fasttext class.

6. Let's practice!

You have seen different models for creating the word embeddings, let's have some fun with them!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.