Get startedGet started for free

Intro to NHANES and sampling

1. Introduction to NHANES and sampling

In this chapter, we're going to learn how to design and analyze Randomized Complete and Randomized Incomplete Block Design experiments. Before we can do that, we'll explore the NHANES dataset and talk about sampling. Let's jump in.

2. Intro to NHANES dataset

NHANES, or the National Health and Nutrition Examination Survey, is conducted every two years in the United States to collect information that assists in determination of major diseases and risk factors that are prevalent in the U.S. NHANES combines interviews and physical exams in order to collect demographic, medical, dental, dietary, and general health-related information about participants. A sample is constructed so that the information is collected on individuals who resemble the United States' population. The information collected benefits not only government agencies, but also researchers interested in risk factors and health conditions in the U.S. The participants in NHANES aren't chosen randomly; they're sampled according to a scheme designed to represent the U.S. population. People older than 60, African Americans, and Hispanic individuals are oversampled in the collection. Caucasians made up 60% of the U.S. population at the last census, and if they did not over-sample these other groups, we wouldn't end up with enough African American and Hispanic individuals to produce reliable statistics on their health.

3. Intro to sampling

In this course we're only going to discuss probability sampling methods, or those where some degree of probability is used to select the sampled units. Non-probability sampling includes things like voluntary responses and convenience sampling, which I encourage you to look into yourself. Let's discuss five basic types of sampling.

4. Sampling - Part 1

A simple random sample is the most straightforward: every unit in a population has an equal probability of being sampled. In R, you can implement it with the sample() function. For more information on how to use this function, be sure to read its documentation. Stratified sampling involves splitting your population by some strata variable, such as race, gender, or type, and then taking a simple random sample inside of each stratified group. In R, stratified sampling can be carried out simply using group_by() and slice_sample() from dplyr.

5. Sampling - Part 2

For cluster sampling, you divide the population into groups called clusters, perhaps every high school in a state, randomly select some number of those clusters, and sample everyone inside the selected clusters. A good way to conduct cluster sampling is the cluster() function in the sampling package. Systematic sampling involves choosing a sample in a systematic way, such as every 5th or 10th unit of the population. It's best implemented in R with a custom function. Multi-stage sampling simply combines one or more of the aforementioned approaches in a logical way.

6. Let's practice!

Now we have all the essential tools for the following exercises, let's get started on the practice!