Session Ready
Exercise

From a bootstrap ensemble to the standard error

In the previous exercise, you constructed one resampling trial to get a rough idea of how the effect size of age on wage might have varied had a different random sample of the original population been used. In practice, one carries out many such trials in order to sketch out the resampling distribution.

Of course, you could use a loop to program the carrying out of repeated trials. However, the operation is so common that the statisticalModeling package provides a function to do this, called ensemble(). An ensemble is a collection of trials. Each of the trials contained in the output of ensemble() consists of a resampled data frame and a model trained on that data frame. Once you have this, you can calculate the numerical quantity of interest on each of the trials in order to see the resampling distribution.

The code in the editor starts you out with a model, wage ~ age + sector, trained on the CPS85 data.

Instructions
100 XP
  • Use ensemble() to create 10 resampling trials. Note that ensemble() takes as input the original model, which provides all the information needed to create the trials.
  • Use effect_size() to find the effect size of age on wage. View the results to get a sense of how much variation there is from trial to trial.
  • In practice, it's common to construct resampling ensembles of 100 to 500 trials. Do so.
  • Use effect_size() on the bigger set of trials.
  • Calculate the standard deviation of the effect size. This is the bootstrapped estimate of the standard error of the effect size, a measure of the precision of the quantity calculated on the original model.