Inference for quantitative data

1. Inference for quantitative data

We have summarized our data and we've graphed our data. But what about inference for a quantitative variable? Let's explore how to compare the means of a quantitative survey variable for two sub-groups of the population.

2. Inference for quantitative data

Specifically, we might ask "Is there convincing evidence that the average number of poor health days is different between smokers and non-smokers?" Looking at the plot we created in the last video, it seems like the answer is "Yes!" But remember, this bar plot contains estimates based on the sample and not the actual averages for all smokers and non-smokers in America. And while the error bars do give us a sense of uncertainty, they can't answer our question. We need to run a hypothesis test.

3. Survey-weighted t-test

Because we have one quantitative variable and one categorical variable with two groups, we should run a two sample t-test. Remember the null hypothesis is the dull hypothesis. In this case, we would assume the average number of poor health days is the same for smokers and non-smokers. And, we put our conjecture, that smokers have a different average number of poor health days than non-smokers, in the alternative. Now to complete the hypothesis test, we need to compute the test statistic and corresponding p-value. Recall that the test statistic summarizes the discrepancy between our sample results and the sample results that would have been most consistent with the null. In the case where we are comparing two groups, like we are here, the test statistic is the difference in the survey-weighted sample means divided by the standard error of the difference in means. It follows a t-distribution and therefore, the R function we will use is svyttest(). We use svyttest() and not the R function t dot test() because svyttest() computes a test statistic that accounts for the complex sampling design.

4. Survey-weighted t-test

In svyttest(), we provide a formula where our quantitative variable is on the left side and the categorical variable is on the right side. So, here we say formula equals DaysPhysHlthBad tilde SmokeNow. We also need to provide the design. The output contains the test statistic, denoted by t. A test statistic equal to 0 would be most consistent with the null. We have a test statistic of 3.82, but is that a large enough discrepancy? Remember the p-value helps out with that question. Notice this is also provided in the output and here equals 0.00058. The p-value tells us that the probability of getting this extreme of results if the average poor health days is the same for smokers and non-smokers is very small. Therefore, we have evidence that the average number of poor health days differs between smokers and non-smokers.

5. Let's practice!

Now it's your turn to practice using t-tests on survey data!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Analyzing Survey Data in R

IntermediateSkill Level

4.8+

143 reviews