Get startedGet started for free

Resampling NHANES data

The NHANES data is collected on sampled units (people) specifically selected to represent the U.S. population. However, let's resample the nhanes_final dataset in different ways so we get a feel for the different sampling methods.

We can conduct a simple random sample using slice_sample() from dplyr. It takes as input a dataset and an integer of number of rows to sample.

Stratified sampling can be done by combining group_by() and slice_sample(). The function will sample n from each of the groups specified in the group_by().

The sampling package's cluster() creates cluster samples. It takes in a dataset name, the variable in the set to be used as the cluster variable, passed as a vector with the name as a string (e.g. c("variable")), a number of clusters to select, and a method.

This exercise is part of the course

Experimental Design in R

View Course

Exercise instructions

  • Use slice_sample() to select 2500 observations from nhanes_final and save as nhanes_srs.
  • Create nhanes_stratified by using group_by() and slice_sample(). Stratify by riagendr and select 2000 of each gender. Confirm that it worked by using count() to examine nhanes_stratified's gender variable.
  • Load the sampling package. Use cluster() to divide nhanes_final by "indhhin2" into 6 clusters using the "srswor" method. Assign to nhanes_cluster.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Use slice_sample() to create nhanes_srs
nhanes_srs <- ___ %>% ___(n=___)

# Create nhanes_stratified with group_by() and slice_sample()
___ <- ___ %>% group_by(___) %>% ___(n=___)
nhanes_stratified %>% ___

# Load sampling package and create nhanes_cluster with cluster()
___
nhanes_cluster <- cluster(___, "___", 6, method = "srswor")
Edit and Run Code