Feature transformations
You are discussing the credit dataset with the bank manager. She suggests that the safest loan applications tend to request mid-range credit amounts. Values that are either too low or too high suggest high risk. This means that a non-linear relationship might exist between this variable and the class. You want to test this hypothesis. You will construct a non-linear transformation of the feature. Then, you will assess which of the two features is better at predicting the class using SelectKBest()
and the chi2()
metric, both of which have been preloaded.
The data is available as a pandas DataFrame called credit
, with the class contained in the column class
. You also have preloaded pandas
as pd
and numpy
as np
.
This exercise is part of the course
Designing Machine Learning Workflows in Python
Exercise instructions
- Define a function that transforms a numeric vector by considering the absolute difference of each value from the average value of the vector.
- Apply this transformation to the
credit_amount
column of the dataset and store in new column calleddiff
- Create a
SelectKBest()
feature selector to pick one of the two columns,credit_amount
anddiff
using thechi2()
metric. - Inspect the results.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Function computing absolute difference from column mean
def abs_diff(x):
return ____(x-____)
# Apply it to the credit amount and store to new column
credit['diff'] = ____
# Create a feature selector with chi2 that picks one feature
sk = ____(chi2, ____)
# Use the selector to pick between credit_amount and diff
sk.fit(____, credit['class'])
# Inspect the results
sk.____()