Get startedGet started for free

Bootstrapping the average maternal age

Maternal age, or the age of a mother at the time of giving birth, is an important marker of natal health in a population. Too high or low a maternal age can have adverse effects on the outcome of the birth.

You work for the US Department of Health as a Data Analyst. You are given a list, ls_df, of 51 data frames, one for each US state and Washington DC. Each data frame contains the column maternal_age. Your boss would like you to bootstrap a distribution of the mean maternal age for each state. You have already written a loop to do the bootstrap on a single data frame. You need to parallelize this calculation. The parallel package has been loaded for you.

This exercise is part of the course

Parallel Programming in R

View Course

Exercise instructions

  • Wrap the bootstrap loop into a function that returns the distribution of the mean.
  • Set up a cluster of four cores.
  • Apply the bootstrap function to ls_df in parallel using parLapply().
  • Stop the cluster once all calculations are done.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Wrap the loop into a function
boot_mean <- ___ (df) ___
  est <- rep(0, 1e3)
  for (i in 1:1e3) {
    b <- sample(df$mother_age, replace = T)
    est[i] <- mean(b)
  }
  return(est)
___
# Make a cluster of four
cl <- ___
# Apply function to ls_df in parallel
state_dist <- ___
# Stop cluster
___(cl)
Edit and Run Code