Exercise

Population & sampling distribution means

One of the useful features of sampling distributions is that you can quantify them. In particular, you can calculate summary statistics on them. Here, we'll look at the relationship between the mean of the sampling distribution and the population parameter that the sampling is supposed to estimate.

Three sampling distributions are provided. In each case, the employee attrition dataset was sampled using simple random sampling, then the mean attrition was calculated. This was done 1000 times to get a sampling distribution of mean attritions. One sampling distribution used a sample size of 5 for each replicate, one used 50, and one used 500.

attrition_pop, sampling_distribution_5, sampling_distribution_50, and sampling_distribution_500 are available; dplyr is loaded.

Instructions 1/2

undefined XP
    1
    2
  • Using sampling_distribution_5, calculate the mean across all the replicates of the mean_attritions (a mean of sample means). Store this in a column called mean_mean_attrition.
  • Do the same calculation using sampling_distribution_50 and sampling_distribution_500.