Maak vectoren van zinnen en volgende tekens

Deze oefening laat zien hoe belangrijk een goede datavoorbereiding is. Je gebruikt teksten met zinnen van het personage Sheldon uit de tv-serie The Big Bang Theory als input en maakt vectoren met zinsindexen en volgende tekens die nodig zijn voordat je een tekstgeneratiemodel bouwt.

De tekst staat in de variabele sheldon, het vocabulaire (tekens) in de variabele vocabulary, en de hyperparameters chars_window en step zijn ingesteld op respectievelijk 20 en 3. Dit betekent dat een reeks van 20 tekens wordt gebruikt om het volgende teken te voorspellen, en dat het venster bij elke iteratie 3 tekens opschuift.

Daarnaast is het pakket pandas als pd geladen in de omgeving.

Deze oefening maakt deel uit van de cursus

Recurrent Neural Networks (RNN's) voor taalmodellen met Keras

Cursus bekijken

Oefeninstructies

Split de tekst op regeleinden om over zinnen te kunnen loopen.
Loop tot het einde van de zin min chars_window.
Voeg het deel van de zin met chars_window tekens toe aan de variabele sentences en voeg het daaropvolgende teken toe aan de variabele next_chars.
Gebruik de verkregen vectoren om een pd.DataFrame() te maken en print de eerste rijen.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Instantiate the vectors
sentences = []
next_chars = []
# Loop for every sentence
for sentence in sheldon.____:
    # Get 20 previous chars and next char; then shift by step
    for i in range(0, len(sentence) - ____, step):
        sentences.append(sentence[i:i + ____])
        next_chars.append(sentence[____ + chars_window])

# Define a Data Frame with the vectors
df = pd.DataFrame({'sentence': ____, 'next_char': ____})

# Print the initial rows
print(df.head())

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Recurrent Neural Networks (RNN's) voor taalmodellen met Keras

SkillTag.level.advancedSkillTag.label

4.8+

Begin de cursus gratis

In this chapter, you will learn the foundations of Recurrent Neural Networks (RNN). Starting with some prerequisites, continuing to understanding how information flows through the network and finally seeing how to implement such models with Keras in the sentiment classification task.

Exercise 1: Introduction to the course Exercise 2: Comparing the number of parameter of RNN and ANN Exercise 3: Sentiment analysis Exercise 4: Sequence to sequence models Exercise 5: Introduction to language models Exercise 6: Getting used to text data Exercise 7: Preparing text data for model input Exercise 8: Transforming new text Exercise 9: Introduction to RNN inside Keras Exercise 10: Keras models Exercise 11: Keras preprocessing Exercise 12: Your first RNN model

You will learn about the vanishing and exploding gradient problems, often occurring in RNNs, and how to deal with them with the GRU and LSTM cells. Furthermore, you'll create embedding layers for language models and revisit the sentiment classification task.

Exercise 1: Vanishing and exploding gradients Exercise 2: Exploding gradient problem Exercise 3: Vanishing gradient problem Exercise 4: GRU and LSTM cells Exercise 5: GRU cells are better than simpleRNN Exercise 6: Stacking RNN layers Exercise 7: The Embedding layer Exercise 8: Number of parameters comparison Exercise 9: Transfer learning Exercise 10: Embeddings improves performance Exercise 11: Sentiment classification revisited Exercise 12: Better sentiment classification Exercise 13: Using the CNN layer

Next, in this chapter you will learn how to prepare data for the multi-class classification task, as well as the differences between multi-class classification and binary classification (sentiment analysis). Finally, you will learn how to create models and measure their performance with Keras.

Exercise 1: Data pre-processing Exercise 2: Prepare label vectors Exercise 3: Pre-process data Exercise 4: Transfer learning for language models Exercise 5: Transfer learning starting point Exercise 6: Word2Vec Exercise 7: Multi-class classification models Exercise 8: Exploring 20 News Groups dataset Exercise 9: Classifying news articles Exercise 10: Assessing the model's performance Exercise 11: Precision-Recall trade-off Exercise 12: Precision or Recall, that is the question Exercise 13: Performance on multi-class classification

This chapter introduces you to two applications of RNN models: Text Generation and Neural Machine Translation. You will learn how to prepare the text data to the format needed by the models. The Text Generation model is used for replicating a character's way of speech and will have some fun mimicking Sheldon from The Big Bang Theory. Neural Machine Translation is used for example by Google Translate in a much more complex model. In this chapter, you will create a model that translates Portuguese small phrases into English.

Exercise 1: Sequence-to-sequence-modellen Exercise 2: Voorbeelden van tekstgeneratie Exercise 3: NMT-voorbeeld Exercise 4: De tekstgeneratiefunctie Exercise 5: Voorspel het volgende teken Exercise 6: Genereer zin met context Exercise 7: Pas de waarschijnlijkheidsschaal aan Exercise 8: Modellen voor tekengeneratie Exercise 9: Maak vectoren van zinnen en volgende tekens

Huidige oefening

Exercise 10: De data voorbereiden voor training Exercise 11: Het tekstgeneratiemodel maken Exercise 12: Neurale machinevertaling Exercise 13: De invoertekst voorbereiden Exercise 14: De uitvoertekst voorbereiden Exercise 15: Vertaal Portugees naar Engels Exercise 16: Gefeliciteerd!