1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to Python & Machine Learning (with Analytics Vidhya Hackathons)

Exercise

Understanding distribution of numerical variables

Now that we are familiar with basic data characteristics, let us study the distribution of numerical variables. Let us start with numeric variable "ApplicantIncome".

Let's start by plotting the histogram of ApplicantIncome using the following command:

train['ApplicantIncome'].hist(bins=50)

Or
train.ApplicantIncome.hist(bins=50)

Next, we can also look at box plots to understand the distributions. Box plot for ApplicantIncome can be plotted by

train.boxplot(column='ApplicantIncome')

Instructions

100 XP
  • Use hist() to plot histogram
  • Use by=categorical_variable with box plot to look at distribution by categories
train.boxplot(column='ApplicantIncome', by='Gender')