Session Ready
Exercise

Understanding the Data

You can look at a summary of numerical fields by using dataframe.describe(). It provides the count, mean, standard deviation (std), min, quartiles and max in its output.

dataframe.describe()

For the non-numeric values (e.g. PropertyArea, CreditHistory etc.), we can look at frequency distribution. The frequency table can be printed by the following command:

df[column_name].value_counts()

OR
df.column_name.value_counts()
Instructions
100 XP
  • Use dataframe.describe() to understand the distribution of numerical variables
  • Look at unique values of non-numeric values using df[column_name].value_counts()