Get startedGet started for free

Feature transformations

You are discussing the credit dataset with the bank manager. She suggests that the safest loan applications tend to request mid-range credit amounts. Values that are either too low or too high suggest high risk. This means that a non-linear relationship might exist between this variable and the class. You want to test this hypothesis. You will construct a non-linear transformation of the feature. Then, you will assess which of the two features is better at predicting the class using SelectKBest() and the chi2() metric, both of which have been preloaded.

The data is available as a pandas DataFrame called credit, with the class contained in the column class. You also have preloaded pandas as pd and numpy as np.

This exercise is part of the course

Designing Machine Learning Workflows in Python

View Course

Exercise instructions

  • Define a function that transforms a numeric vector by considering the absolute difference of each value from the average value of the vector.
  • Apply this transformation to the credit_amount column of the dataset and store in new column called diff
  • Create a SelectKBest() feature selector to pick one of the two columns, credit_amount and diff using the chi2() metric.
  • Inspect the results.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Function computing absolute difference from column mean
def abs_diff(x):
    return ____(x-____)

# Apply it to the credit amount and store to new column
credit['diff'] = ____

# Create a feature selector with chi2 that picks one feature
sk = ____(chi2, ____)

# Use the selector to pick between credit_amount and diff
sk.fit(____, credit['class'])

# Inspect the results
sk.____()
Edit and Run Code