1. Congratulations!
Congratulations! You did it! You've finished this course on Dealing With missing data in R. This course covered an often overlooked area of statistics - missing data, and inside the world of missing data, we covered yet another area that is often overlooked: How to handle, explore, and visualize missing values.
I really hope that this course has helped you improve your understanding of working with missing data. To recap:
2. Chapter 1
In Chapter 1, we defined missing values as those values that should have been recorded but were not.
You then learned how to summarize missing values, using functions like miss_var_summary to calculate the number and percentage of missing values in each variable.
3. Chapter 1
You also learned how to visualize missing values across the entire dataset with vis_miss(), and to visualize the amount of missingness for variables with gg_miss_var().
4. Chapter 2
In Chapter 2 you learned how to find unusual missing values like "N/A" using miss_scan_count. You then learned how to replace and update these with replace_with_na.
You also learned how to handle implicit missing values - using complete and fill from tidyr. Implicit missing values are those values that are missing but not listed in the data (like when data says January and then the next month is April. February and March are missing!).
We finished up chapter two by discussing missing data dependence, and the concepts of Missing Completely at Random (MCAR), Missing At Random (MAR), Missing Not At Random (MNAR), and what they mean for your data analysis.
5. Chapter 3
In chapter 3 you learned about missing data workflows.
You learned about special data structures for working with missing data, the shadow matrix, and nabular data, and how to use the as_shadow() and nabular() functions to create these data structures.
You then applied nabular data in workflows to explore missing data, to link missing summaries back to values in the data.
6. Chapter 3
You also learned how to use ggplot to explore and visualize how values change as other variables go missing.
Finally, you learned how to visualize missingness across two variables, and how and why to visualize missings in a scatter plot.
7. Chapter 4
In chapter 4 we discussed how to impute data using the impute_ functions from naniar and simputation, and how to visualize these.
You also learned how to explore the features of imputed values, to understand what makes them good, or bad, and how to compare your imputed values to the original data.
8. Chapter 4
You learned how to use and visualize many imputation models, and how these can affect your subsequent inferences.
9. This is only the beginning!
Now, as they say, this is only the beginning.
To continue your journey, and learn more about missing data, you should check out the naniar package, which contains many useful functions to explore and evaluate your missing data.
The visdat package provides more than just heatmaps of missing data and is well worth looking into to learn more about pre exploratory visualization.
From here, to continue your journey, you might want to explore other workflows for imputing your missing data.
There are many ways to decide how to impute data. We didn't have time for it in the course, but multiple imputation is another great area of research. To learn more about multiple imputation, I highly recommend Stefan van Buuren's package, mice, and his book, Flexible Imputation of Missing Data.
10. Thank you!
I really hope you enjoyed this course, now get out there and tackle the world of missing data.