Exercise

Pre-process data

You learned the differences for pre-processing the data in the case of multi-class classification. Let's put that into practice by preprocessing the data in anticipation of creating a simple multi-class classification model.

The dataset is loaded in the variable news_dataset, and has the following attributes:

  • news_dataset.data: array with texts
  • news_dataset.target: array with target categories as numerical indexes

The sample data contains 5,000 observations.

Instructions

100 XP
  • Instantiate the Tokenizer class on the tokenizer variable.
  • Fit the tokenizer variable on the text data.
  • Use the .texts_to_sequences() method on the text data.
  • Use the to_categorical() function to prepare the target indexes.