1. Missing value imputation using transform()
Welcome back! Now that we have seen why and how to use the transform() function on a grouped pandas object, we will address a very specific task: missing value imputation.
2. Counting missing values
Before we actually see how we can use the transform() function for missing value imputation, we will see how many missing values there are in our variable of interest in each of the groups.
We created a restaurant_nan dataset, in which the total bill of 45 random observations were set to NaN. We group the data according to the time of the day each meal was recorded, before and after adding the random missing values.
Then, we count the number of non-missing values in each instance, and we print the difference.
In this instance we see that there are 32 missing values in meals that came from dinner and 13 from lunch. Bear in mind that repeating the random generation of missing entries might affect the results. This proportion would change, since the removal was random.
3. Missing value imputation
After counting the number of missing values in our data, we will show how to fill the missing values with a group specific function. The most common choices are the mean and the median, and the selection has to do with the skewness of the data.
As we did before, we define a lambda transformation using the fillna() function to replace every missing value with its group mean. As before, we group our data according to the time of the meal, and then replace the missing values by applying the pre-defined transformation.
As we can see, the observations at index 0 and index 6 are exactly the same, which means that their missing value has been replaced by their group's mean.
4. Comparison with native methods
Our goal in this lesson is to show whether the use of the transform() function performs the task of imputing missing values faster than the native Python way.
This script performs group-wise missing value imputation, like the transform() function does.
Comparing the efficiency of both method, we can clearly see that the transform() function applied on a grouped object performs faster than the native Python code for this task.
5. Let's do it!
Now that we showed how efficient the transform() function is for group-wise missing value imputation, let's have you try it in the following exercises.