Get startedGet started for free

Advanced transformations

1. Advanced transformations

We will now move on to learn and try out a few less common, yet powerful, feature transformation techniques.

2. The data

We will be working with a numeric cut of the loans dataset that contains Loan_Status and five features for 480 observations, such as ApplicantIncome, LoanAmount, and Credit_History. Our goal is to predict Loan_Status and explore the effect of a few new transformations.

3. Predict with plain workflow

We configure our plain recipe by defining a prediction formula and setting the data to "train", and bundling it with our model via a "workflow" to set our baseline. We then fit the workflow using the train data and assess its performance with the test set. Our plain workflow does a better than random job in predicting Loan_Status with an accuracy of 0-point-75. Lets see if can we improve it with a few feature engineering tricks.

4. The step_poly() function

step_poly() will seamlessly create new variables based on powers of the original ones. The default degree, which is what we are using here, is two which will add a squared feature to our data set so we can capture non-linear relationships. Of course, increasing the degree to gain more flexibility is always tempting, but this should be done carefully, as it can result in overfitting. This transformation maintained our accuracy metric but improved our ROC AUC significantly.

5. The step_percentile() function

step_percentile() determines the empirical distribution of a variable based on the training set and converts all values to percentiles. Sometimes replacing variables by their percentiles can improve our model. There is no clear-cut rule for this and it is one example of feature engineering being as much art as it is science. In this case, we are going to convert the numeric predictors to percentiles. Implementing this transformation improves our accuracy, but decreases ROC AUC.

6. Let's practice!

It's now time to get our hands dirty!