Exercise

Resampling NHANES data

The NHANES data is collected on sampled units (people) specifically selected to represent the U.S. population. However, let's resample the nhanes_final dataset in different ways so we get a feel for the different sampling methods.

We can conduct a simple random sample using slice_sample() from dplyr. It takes as input a dataset and an integer of number of rows to sample.

Stratified sampling can be done by combining group_by() and slice_sample(). The function will sample n from each of the groups specified in the group_by().

The sampling package's cluster() creates cluster samples. It takes in a dataset name, the variable in the set to be used as the cluster variable, passed as a vector with the name as a string (e.g. c("variable")), a number of clusters to select, and a method.

Instructions

100 XP
  • Use slice_sample() to select 2500 observations from nhanes_final and save as nhanes_srs.
  • Create nhanes_stratified by using group_by() and slice_sample(). Stratify by riagendr and select 2000 of each gender. Confirm that it worked by using count() to examine nhanes_stratified's gender variable.
  • Load the sampling package. Use cluster() to divide nhanes_final by "indhhin2" into 6 clusters using the "srswor" method. Assign to nhanes_cluster.