Understanding the Data
You can look at a summary of numerical fields by using dataframe.describe(). It provides the count, mean, standard deviation (std), min, quartiles and max in its output.
dataframe.describe()
For the non-numeric values (e.g. PropertyArea, CreditHistory etc.), we can look at frequency distribution. The frequency table can be printed by the following command:
df[column_name].value_counts()
df.column_name.value_counts()
This exercise is part of the course
Introduction to Python & Machine Learning (with Analytics Vidhya Hackathons)
Exercise instructions
- Use
dataframe.describe()
to understand the distribution of numerical variables - Look at unique values of non-numeric values using
df[column_name].value_counts()
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
#Training and Testing data set are loaded in train and test dataframe respectively
# Look at the summary of numerical variables for train data set
df= train.________()
print (df)
# Print the unique values and their frequency of variable Property_Area
df1=train.Property_Area.________()
print (df1)