Übung

Transforming unseen data

When creating vectors from text, any transformations that you perform before training a machine learning model, you also need to apply on the new unseen (test) data. To achieve this follow the same approach from the last chapter: fit the vectorizer only on the training data, and apply it to the test data.

For this exercise the speech_df DataFrame has been split in two:

train_speech_df: The training set consisting of the first 45 speeches.
test_speech_df: The test set consisting of the remaining speeches.

Anleitung

100 XP

Instantiate TfidfVectorizer.
Fit the vectorizer and apply it to the text_clean column.
Apply the same vectorizer on the text_clean column of the test data.
Create a DataFrame of these new features from the test set.

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Übung

Anleitung

Übung