Get startedGet started for free

Understanding the Data

You can look at a summary of numerical fields by using dataframe.describe(). It provides the count, mean, standard deviation (std), min, quartiles and max in its output.

dataframe.describe()

For the non-numeric values (e.g. PropertyArea, CreditHistory etc.), we can look at frequency distribution. The frequency table can be printed by the following command:

df[column_name].value_counts()

OR
df.column_name.value_counts()

This exercise is part of the course

Introduction to Python & Machine Learning (with Analytics Vidhya Hackathons)

View Course

Exercise instructions

  • Use dataframe.describe() to understand the distribution of numerical variables
  • Look at unique values of non-numeric values using df[column_name].value_counts()

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

#Training and Testing data set are loaded in train and test dataframe respectively

# Look at the summary of numerical variables for train data set
df= train.________()
print (df)

# Print the unique values and their frequency of variable Property_Area
df1=train.Property_Area.________()
print (df1)
Edit and Run Code