1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to Python & Machine Learning (with Analytics Vidhya Hackathons)

Exercise

Treat / Tranform extreme values of LoanAmount and ApplicantIncome

Let’s analyze LoanAmount first. Since the extreme values are practically possible, i.e. some people might apply for high-value loans due to specific needs.

train ['LoanAmount'].hist(bins=20)

So instead of treating them as outliers, let’s try a log transformation to nullify their effect:

import numpy as np
train ['LoanAmount_log'] = np.log(train['LoanAmount'])
train ['LoanAmount_log'].hist(bins=20)

Now the distribution looks much closer to normal and effect of extreme values has been significantly subsided.

Instructions

100 XP
  • Add both ApplicantIncome and CoapplicantIncome as TotalIncome
  • Take log transformation of TotalIncome to deal with extreme values