Resampling NHANES data
The NHANES data is collected on sampled units (people) specifically selected to represent the U.S. population. However, let's resample the nhanes_final
dataset in different ways so we get a feel for the different sampling methods.
We can conduct a simple random sample using slice_sample()
from dplyr
. It takes as input a dataset and an integer of number of rows to sample.
Stratified sampling can be done by combining group_by()
and slice_sample()
. The function will sample n
from each of the groups specified in the group_by()
.
The sampling
package's cluster()
creates cluster samples. It takes in a dataset name, the variable in the set to be used as the cluster variable, passed as a vector with the name as a string (e.g. c("variable")
), a number of clusters to select, and a method.
This exercise is part of the course
Experimental Design in R
Exercise instructions
- Use
slice_sample()
to select 2500 observations fromnhanes_final
and save asnhanes_srs
. - Create
nhanes_stratified
by usinggroup_by()
andslice_sample()
. Stratify byriagendr
and select 2000 of each gender. Confirm that it worked by usingcount()
to examinenhanes_stratified
's gender variable. - Load the
sampling
package. Usecluster()
to dividenhanes_final
by"indhhin2"
into 6 clusters using the"srswor"
method. Assign tonhanes_cluster
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use slice_sample() to create nhanes_srs
nhanes_srs <- ___ %>% ___(n=___)
# Create nhanes_stratified with group_by() and slice_sample()
___ <- ___ %>% group_by(___) %>% ___(n=___)
nhanes_stratified %>% ___
# Load sampling package and create nhanes_cluster with cluster()
___
nhanes_cluster <- cluster(___, "___", 6, method = "srswor")