1. Learn
  2. /
  3. Kurse
  4. /
  5. Feature Engineering for Machine Learning in Python

Connected

Übung

Transforming unseen data

When creating vectors from text, any transformations that you perform before training a machine learning model, you also need to apply on the new unseen (test) data. To achieve this follow the same approach from the last chapter: fit the vectorizer only on the training data, and apply it to the test data.

For this exercise the speech_df DataFrame has been split in two:

  • train_speech_df: The training set consisting of the first 45 speeches.
  • test_speech_df: The test set consisting of the remaining speeches.

Anleitung

100 XP
  • Instantiate TfidfVectorizer.
  • Fit the vectorizer and apply it to the text_clean column.
  • Apply the same vectorizer on the text_clean column of the test data.
  • Create a DataFrame of these new features from the test set.