Session Ready
Exercise

Generating training and test sets

Before we can apply the t-SNE algorithm to perform a dimensionality reduction, we need to split the original data into a training and test set. Next, we will perform an under-sampling of the majority class and generate a balanced training set.

Generating a balanced dataset is a good practice when we are using tree-based models.

In this exercise you already have the creditcard dataset loaded in the environment. The ggplot2 and data.table packages are already loaded.

Instructions 1/2
undefined XP
  • 1
  • 2
  • Store fraudulent records in creditcard_pos and non-fraudulent records in creditcard_neg.
  • Fix the seed to 1234.