Visualizing time-series imputations
1. Visualizing time-series imputations
While you learned how to impute missing values in time-series data, it is necessary to compare the quality of imputations. In this lesson, we graphically compare and analyze each of these imputations and decide on which imputation can be the best.2. Air quality time-series plot
We'll continue using the 'airquality' time-series DataFrame for visualizing and comparing the imputations. Let's plot a time-series plot of the 'Ozone' column in the 'airquality' DataFrame. To plot, we can simply use 'airquality['Ozone'].plot()' and set the title to 'Ozone' and 'marker="o"'. Observe that the plot that we obtain, has many gaps which need to be filled in. Let's analyze how the various imputation techniques we learned in the previous lesson have imputed the data.3. Ffill Imputation
'ffill_imp' is the imputed version of airquality using the forward fill strategy. To visualize this imputation, we'll create a 'dotted' line plot in color 'red'. We'll also plot the original airquality DataFrame as before, so that we can distinguish between the imputed and non-missing values. From this graph plot, we can see that the 'NaN' values have been filled by the last observed value. However, it is that this is not an optimal way to impute the missing values.4. Bfill Imputation
We can similarly plot for 'bfill_imp' which is the backward filled 'airquality' DataFrame. Like the forward fill method, this too is quite a bad imputation technique as the series of 'NaN' values are all filled by the same value!5. Linear Interpolation
Let's now plot the imputations made with the interpolation() method. The 'linear_interp' DataFrame contains imputation made with the 'linear' strategy. The imputation is quite consistent with the values of the DataFrame.6. Quadratic Interpolation
Observing the plot of 'quadratic' imputation, you find that the values actually overshoot and connect back to the non-missing values. The imputed values are also out of range of non-missing values.7. Nearest Interpolation
The 'nearest' imputed DataFrame can be found to be a combination of forward and backward fills. It is comparatively a better imputation for this DataFrame than backward and forward fill.8. A comparison of the interpolations
We can visualize all the interpolations in a single plot by creating subplots as done in the previous lesson by iterating over the imputations.9. A comparison of the interpolations
You observe from these interpolations that the imputations are comparatively more complex than the imputations by the 'fillna()' method.10. A comparison of imputation techniques
Comparing the time series plots of all the imputations, shows that the linear imputation best imputes the 'airquality' data. While the forward, backward and nearest fill, impute the same values to a series of missing points, the quadratic imputation overshoots the impute values. However, the linear interpolation aptly fills in incrementing or decrementing values in the DataFrame.11. Summary
In this lesson, you learned to create the time-series plot of imputed datasets and make a comparison for choosing the best imputation technique.12. Let's practice!
Now, go ahead and practice!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.