Resampling NHANES data
The NHANES data is collected on sampled units (people) specifically selected to represent the U.S. population. However, let's resample the nhanes_final dataset in different ways so we get a feel for the different sampling methods.
We can conduct a simple random sample using slice_sample() from dplyr. It takes as input a dataset and an integer of number of rows to sample.
Stratified sampling can be done by combining group_by() and slice_sample(). The function will sample n from each of the groups specified in the group_by().
The sampling package's cluster() creates cluster samples. It takes in a dataset name, the variable in the set to be used as the cluster variable, passed as a vector with the name as a string (e.g. c("variable")), a number of clusters to select, and a method.
This exercise is part of the course
Experimental Design in R
Exercise instructions
- Use
slice_sample()to select 2500 observations fromnhanes_finaland save asnhanes_srs. - Create
nhanes_stratifiedby usinggroup_by()andslice_sample(). Stratify byriagendrand select 2000 of each gender. Confirm that it worked by usingcount()to examinenhanes_stratified's gender variable. - Load the
samplingpackage. Usecluster()to dividenhanes_finalby"indhhin2"into 6 clusters using the"srswor"method. Assign tonhanes_cluster.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use slice_sample() to create nhanes_srs
nhanes_srs <- ___ %>% ___(n=___)
# Create nhanes_stratified with group_by() and slice_sample()
___ <- ___ %>% group_by(___) %>% ___(n=___)
nhanes_stratified %>% ___
# Load sampling package and create nhanes_cluster with cluster()
___
nhanes_cluster <- cluster(___, "___", 6, method = "srswor")