Exercise

Feature transformations

You are discussing the credit dataset with the bank manager. She suggests that the safest loan applications tend to request mid-range credit amounts. Values that are either too low or too high suggest high risk. This means that a non-linear relationship might exist between this variable and the class. You want to test this hypothesis. You will construct a non-linear transformation of the feature. Then, you will assess which of the two features is better at predicting the class using SelectKBest() and the chi2() metric, both of which have been preloaded.

The data is available as a pandas DataFrame called credit, with the class contained in the column class. You also have preloaded pandas as pd and numpy as np.

Instructions

100 XP
  • Define a function that transforms a numeric vector by considering the absolute difference of each value from the average value of the vector.
  • Apply this transformation to the credit_amount column of the dataset and store in new column called diff
  • Create a SelectKBest() feature selector to pick one of the two columns, credit_amount and diff using the chi2() metric.
  • Inspect the results.