What does your data look like? (I)
Up until now you have focused on creating new features and dealing with issues in your data. Feature engineering can also be used to make the most out of the data that you already have and use it more effectively when creating machine learning models.
Many algorithms may assume that your data is normally distributed, or at least that all your columns are on the same scale. This will often not be the case, e.g. one feature may be measured in thousands of dollars while another would be number of years. In this exercise, you will create plots to examine the distributions of some numeric columns in the so_survey_df
DataFrame, stored in so_numeric_df
.
This exercise is part of the course
Feature Engineering for Machine Learning in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a histogram
____
plt.show()