1. Learn
  2. /
  3. Courses
  4. /
  5. Practicing Machine Learning Interview Questions in Python

Exercise

Log and power transformations

In the last exercise, you compared the distributions of a training set and test set of loan_data. This is especially poignant in a machine learning interview because the distribution observed dictates whether or not you need to use techniques which nudge your feature distributions toward a normal distribution so that normality assumptions are not violated.

In this exercise, you will be using the log and power transformation from the scipy.stats module on the Years of Credit History feature of loan_data along with the distplot() function from seaborn, which plots both its distribution and kernel density estimation.

All relevant packages have been imported for you.

Here is where you are in the pipeline:

Machine learning pipeline

Instructions 1/3

undefined XP
  • 1
    • Subset loan_data for 'Years of Credit History' and plot its distribution and kernel density estimation (kde) using distplot().
  • 2
    • Apply a log transformation using the Box-Cox transformation to cr_yrs and plot its distribution and kde.
  • 3
    • Transform 'Years of Credit History' using the Box-Cox square-root argument and plot its distribution and kde.