Bootstrapping the average maternal age
Maternal age, or the age of a mother at the time of giving birth, is an important marker of natal health in a population. Too high or low a maternal age can have adverse effects on the outcome of the birth.
You work for the US Department of Health as a Data Analyst. You are given a list, ls_df
, of 51 data frames, one for each US state and Washington DC. Each data frame contains the column maternal_age
. Your boss would like you to bootstrap a distribution of the mean maternal age for each state. You have already written a loop to do the bootstrap on a single data frame. You need to parallelize this calculation. The parallel
package has been loaded for you.
This exercise is part of the course
Parallel Programming in R
Exercise instructions
- Wrap the bootstrap loop into a function that returns the distribution of the mean.
- Set up a cluster of four cores.
- Apply the bootstrap function to
ls_df
in parallel usingparLapply()
. - Stop the cluster once all calculations are done.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Wrap the loop into a function
boot_mean <- ___ (df) ___
est <- rep(0, 1e3)
for (i in 1:1e3) {
b <- sample(df$mother_age, replace = T)
est[i] <- mean(b)
}
return(est)
___
# Make a cluster of four
cl <- ___
# Apply function to ls_df in parallel
state_dist <- ___
# Stop cluster
___(cl)