NHANES EDA
Let's examine our newly constructed dataset with a mind toward EDA. As in the last chapter, it's a good idea to look at both numerical summary measures and visualizations. These help with understanding data and are a good way to find data cleaning steps you may have missed. The nhanes_combined
dataset has been pre-loaded for you.
Say we have access to NHANES patients and want to conduct a study on the effect of being told by a physician to reduce calories/fat in their diet on weight. This is our treatment; we're pretending that instead of this being a question asked of the patient, we randomly had physicians counsel some patients on their nutrition. However, we suspect that there may be a difference in weight based on the gender of the patient - a blocking factor!
This exercise is part of the course
Experimental Design in R
Exercise instructions
- Fill in and run the
dplyr
code to find mean weight (bmxwt
) in kg by our treatment (mcq365d
). Is there anything interesting about theNA
treated patients? - Fill in the
ggplot2
code to look at a boxplot of the IQR of patients' weights by the treatment variable.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Fill in the dplyr code
___ %>%
group_by(___) %>%
summarize(mean = mean(___, na.rm = TRUE))
# Fill in the ggplot2 code
___ %>%
ggplot(aes(as.factor(___), ___)) +
geom_boxplot() +
labs(x = "Treatment",
y = "Weight")