Pre-process data
You learned the differences for pre-processing the data in the case of multi-class classification. Let's put that into practice by preprocessing the data in anticipation of creating a simple multi-class classification model.
The dataset is loaded in the variable news_dataset
, and has the following attributes:
news_dataset.data
: array with textsnews_dataset.target
: array with target categories as numerical indexes
The sample data contains 5,000 observations.
This exercise is part of the course
Recurrent Neural Networks (RNNs) for Language Modeling with Keras
Exercise instructions
- Instantiate the
Tokenizer
class on thetokenizer
variable. - Fit the
tokenizer
variable on the text data. - Use the
.texts_to_sequences()
method on the text data. - Use the
to_categorical()
function to prepare the target indexes.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create and fit tokenizer
tokenizer = ____
tokenizer.fit_on_texts(____)
# Prepare the data
prep_data = tokenizer.____(news_dataset.data)
prep_data = pad_sequences(prep_data, maxlen=200)
# Prepare the labels
target_labels = to_categorical(____)
# Print the shapes
print(prep_data.shape)
print(target_labels.shape)