Aan de slagGa gratis aan de slag

Resampling NHANES data

The NHANES data is collected on sampled units (people) specifically selected to represent the U.S. population. However, let's resample the nhanes_final dataset in different ways so we get a feel for the different sampling methods.

We can conduct a simple random sample using slice_sample() from dplyr. It takes as input a dataset and an integer of number of rows to sample.

Stratified sampling can be done by combining group_by() and slice_sample(). The function will sample n from each of the groups specified in the group_by().

The sampling package's cluster() creates cluster samples. It takes in a dataset name, the variable in the set to be used as the cluster variable, passed as a vector with the name as a string (e.g. c("variable")), a number of clusters to select, and a method.

Deze oefening maakt deel uit van de cursus

Experimental Design in R

Cursus bekijken

Oefeninstructies

  • Use slice_sample() to select 2500 observations from nhanes_final and save as nhanes_srs.
  • Create nhanes_stratified by using group_by() and slice_sample(). Stratify by riagendr and select 2000 of each gender. Confirm that it worked by using count() to examine nhanes_stratified's gender variable.
  • Load the sampling package. Use cluster() to divide nhanes_final by "indhhin2" into 6 clusters using the "srswor" method. Assign to nhanes_cluster.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Use slice_sample() to create nhanes_srs
nhanes_srs <- ___ %>% ___(n=___)

# Create nhanes_stratified with group_by() and slice_sample()
___ <- ___ %>% group_by(___) %>% ___(n=___)
nhanes_stratified %>% ___

# Load sampling package and create nhanes_cluster with cluster()
___
nhanes_cluster <- cluster(___, "___", 6, method = "srswor")
Code bewerken en uitvoeren