1. Fill continuous missing values
While listwise deletion is often the most statistically sound method of dealing with missing values in cases where you believe the gaps are at random, this will often not be feasible in real world use cases.
2. Deleting missing values
One of the most common issues with removing all rows with missing values is if you were building a predictive model. If you were to remove all cases that had missing values when training your model, you would quickly run into problems when you received missing values in your test set, where you do not have the option of just not predicting these rows.
3. What else can you do?
So what's the alternative? Replacing missing values.
For categorical columns, as you saw in the last lesson you can either replace missing values with a string that flags missing values such as 'None', or you can use the most common occurring value.
However, for numeric columns, you may want to replace missing values with a more suitable value. So what is a suitable value?
4. Measures of central tendency
In cases like this we often turn to the measures of central tendency, which are the central or typical value for a distribution. The most commonly used values are the mean and the median.
One caveat that you must keep in mind when using these methods is that it can lead to biased estimates of the variances and covariances of the features. Similarly, the standard error and test statistics can be incorrectly estimated so if these metrics are needed they should be calculated before the missing values have been filled.
5. Calculating the measures of central tendency
You can calculate these measures directly from a pandas series by simply calling the required method on the series as shown here. Note that the missing values are excluded by default when calculating these statistics.
6. Fill the missing values
Then leveraging what you implemented in previous lesson, you can directly fill all missing values using the fillna() method. Only this time you are filling missing values in the ConvertedSalary column with the mean of this column.
Since you filled in the missing values with the mean, you may end up with too many decimal places. You can get rid of all the decimal values by changing the data type to integer using the astype() method like so.
7. Rounding values
or you can first round the mean before filling in the missing values as shown here.
8. Let's Practice!
Now its your turn to put what you have learned into practice.