Get startedGet started for free

Pre-process data

You learned the differences for pre-processing the data in the case of multi-class classification. Let's put that into practice by preprocessing the data in anticipation of creating a simple multi-class classification model.

The dataset is loaded in the variable news_dataset, and has the following attributes:

  • news_dataset.data: array with texts
  • news_dataset.target: array with target categories as numerical indexes

The sample data contains 5,000 observations.

This exercise is part of the course

Recurrent Neural Networks (RNNs) for Language Modeling with Keras

View Course

Exercise instructions

  • Instantiate the Tokenizer class on the tokenizer variable.
  • Fit the tokenizer variable on the text data.
  • Use the .texts_to_sequences() method on the text data.
  • Use the to_categorical() function to prepare the target indexes.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create and fit tokenizer
tokenizer = ____
tokenizer.fit_on_texts(____)

# Prepare the data
prep_data = tokenizer.____(news_dataset.data)
prep_data = pad_sequences(prep_data, maxlen=200)

# Prepare the labels
target_labels = to_categorical(____)

# Print the shapes
print(prep_data.shape)
print(target_labels.shape)
Edit and Run Code