Word2Vec

In deze oefening maak je een Word2Vec-model met Keras.

Het corpus dat is gebruikt om het model voor te trainen, is het script van alle afleveringen van de tv-serie The Big Bang Theory, zin voor zin opgedeeld. Dit staat in de variabele bigbang.

De tekst in het corpus is naar kleine letters omgezet en alle woorden zijn getokenized. Het resultaat staat in de variabele tokenized_corpus.

Er is een Word2Vec-model voorgetraind met een venstergrootte van 10 woorden voor context (5 vóór en 5 ná het middenwoord), woorden met minder dan 3 voorkomens zijn verwijderd en de skip-grammethode is gebruikt met 50 dimensies. Het model is opgeslagen in het bestand bigbang_word2vec.model.

De klasse Word2Vec is al geladen in de omgeving vanuit gensim.models.word2vec.

Deze oefening maakt deel uit van de cursus

Recurrent Neural Networks (RNN's) voor taalmodellen met Keras

Cursus bekijken

Oefeninstructies

Laad het voorgetrainde Word2Vec-model.
Sla een list met de woorden "bazinga", "penny", "universe", "spock", "brain" op in de variabele words_of_interest, in die volgorde.
Loop over elk woord van interesse, gebruik de methode .most_similar() op het attribuut wv en voeg de top 5 vergelijkbare woorden als een dictionary toe aan top5_similar_words.
Print de gevonden top 5 woorden voor elk van de woorden van interesse.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Word2Vec model
w2v_model = Word2Vec.load(____)

# Selected words to check similarities
words_of_interest = ____

# Compute top 5 similar words for each of the words of interest
top5_similar_words = []
for word in words_of_interest:
    top5_similar_words.append(
      {word: [item[0] for item in w2v_model.wv.____([word], topn=5)]}
    )

# Print the similar words
____

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Recurrent Neural Networks (RNN's) voor taalmodellen met Keras

SkillTag.level.advancedSkillTag.label

4.8+

Begin de cursus gratis

In this chapter, you will learn the foundations of Recurrent Neural Networks (RNN). Starting with some prerequisites, continuing to understanding how information flows through the network and finally seeing how to implement such models with Keras in the sentiment classification task.

Exercise 1: Introduction to the course Exercise 2: Comparing the number of parameter of RNN and ANN Exercise 3: Sentiment analysis Exercise 4: Sequence to sequence models Exercise 5: Introduction to language models Exercise 6: Getting used to text data Exercise 7: Preparing text data for model input Exercise 8: Transforming new text Exercise 9: Introduction to RNN inside Keras Exercise 10: Keras models Exercise 11: Keras preprocessing Exercise 12: Your first RNN model

You will learn about the vanishing and exploding gradient problems, often occurring in RNNs, and how to deal with them with the GRU and LSTM cells. Furthermore, you'll create embedding layers for language models and revisit the sentiment classification task.

Exercise 1: Vanishing and exploding gradients Exercise 2: Exploding gradient problem Exercise 3: Vanishing gradient problem Exercise 4: GRU and LSTM cells Exercise 5: GRU cells are better than simpleRNN Exercise 6: Stacking RNN layers Exercise 7: The Embedding layer Exercise 8: Number of parameters comparison Exercise 9: Transfer learning Exercise 10: Embeddings improves performance Exercise 11: Sentiment classification revisited Exercise 12: Better sentiment classification Exercise 13: Using the CNN layer

Next, in this chapter you will learn how to prepare data for the multi-class classification task, as well as the differences between multi-class classification and binary classification (sentiment analysis). Finally, you will learn how to create models and measure their performance with Keras.

Exercise 1: Data preprocessing Exercise 2: Labelvectoren voorbereiden Exercise 3: Data pre-processen Exercise 4: Transfer learning voor taalmodellen Exercise 5: Startpunt voor transfer learning Exercise 6: Word2Vec

Huidige oefening

Exercise 7: Modellen voor multiclass-classificatie Exercise 8: De 20 Newsgroups-gegevensset verkennen Exercise 9: Nieuwsartikelen classificeren Exercise 10: De prestatie van het model beoordelen Exercise 11: Trade-off tussen precisie en recall Exercise 12: Precision of Recall, dát is de vraag Exercise 13: Prestaties bij multi-class classificatie

This chapter introduces you to two applications of RNN models: Text Generation and Neural Machine Translation. You will learn how to prepare the text data to the format needed by the models. The Text Generation model is used for replicating a character's way of speech and will have some fun mimicking Sheldon from The Big Bang Theory. Neural Machine Translation is used for example by Google Translate in a much more complex model. In this chapter, you will create a model that translates Portuguese small phrases into English.

Exercise 1: Sequence to Sequence Models Exercise 2: Text generation examples Exercise 3: NMT example Exercise 4: The Text Generating Function Exercise 5: Predict next character Exercise 6: Generate sentence with context Exercise 7: Change the probability scale Exercise 8: Text Generation Models Exercise 9: Create vectors of sentences and next characters Exercise 10: Preparing the data for training Exercise 11: Creating the text generation model Exercise 12: Neural Machine Translation Exercise 13: Preparing the input text Exercise 14: Preparing the output text Exercise 15: Translate Portuguese to English Exercise 16: Congratulations!