Aan de slagGa gratis aan de slag

Pre-process data

You learned the differences for pre-processing the data in the case of multi-class classification. Let's put that into practice by preprocessing the data in anticipation of creating a simple multi-class classification model.

The dataset is loaded in the variable news_dataset, and has the following attributes:

  • news_dataset.data: array with texts
  • news_dataset.target: array with target categories as numerical indexes

The sample data contains 5,000 observations.

Deze oefening maakt deel uit van de cursus

Recurrent Neural Networks (RNNs) for Language Modeling with Keras

Cursus bekijken

Oefeninstructies

  • Instantiate the Tokenizer class on the tokenizer variable.
  • Fit the tokenizer variable on the text data.
  • Use the .texts_to_sequences() method on the text data.
  • Use the to_categorical() function to prepare the target indexes.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Create and fit tokenizer
tokenizer = ____
tokenizer.fit_on_texts(____)

# Prepare the data
prep_data = tokenizer.____(news_dataset.data)
prep_data = pad_sequences(prep_data, maxlen=200)

# Prepare the labels
target_labels = to_categorical(____)

# Print the shapes
print(prep_data.shape)
print(target_labels.shape)
Code bewerken en uitvoeren