1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to Python & Machine Learning (with Analytics Vidhya Hackathons)

Exercise

Understanding distribution of categorical variables

We have looked at the distributions of ApplicantIncome and LoanIncome, now it's time for looking at categorical variables in more details. For instance, let's see whether Gender is affecting the loan status or not. This can be tested using cross-tabulation as shown below:

pd.crosstab( train ['Gender'], train ["Loan_Status"], margins=True)

Next, we can also look at proportions can be more intuitive in making some quick insights. We can do this using the apply function. You can read more about cross tab and apply functions here.


def percentageConvert(ser):
  return ser/float(ser[-1])

pd.crosstab(train ["Gender"], train ["Loan_Status"], margins=True).apply(percentageConvert, axis=1)

Instructions

100 XP
  • Use value_counts() with train['LoanStatus'] to look at the frequency distribution
  • Use crosstab with LoanStatus and CreditHistory to perform bi-variate analysis