Get startedGet started for free

Research question

1. Research question

Consider a situation where you are interested in determining whether or not there is a linear model connecting protein and carbohydrates in the entire population of foods from Starbucks. we will walk through the pieces of the linear model output, and then in the following chapters we will explore all the pieces of inference in further detail.

2. Protein & carbohydrates: research question

The variables in the Starbucks dataset include: calories, fat, carbohydrates, fiber, and protein. If interest is in determining a linear relationship between two of the variables, we can approach the linear model investigation in two ways: with a one sided or two sided hypothesis. A two-sided research question investigates whether the two variables are linearly associated. A one-sided research question (in this scenario) investigates whether the two variables have a positive linear association. In order to avoid excessive false positives, the research question is always decided on before looking at the data.

3. Linear model output: estimates

note the two different (but similar) ways to output the linear model information. recall that the estimates have been calculated using least squares optimization, the value for the slope (0.381) is exactly the same regardless of the format of the output. as with the slope, the intercept (37.1) is given in the long or tidy format.

4. Linear model output: standard error

the variability of both the intercept and the slope are given in the column called standard error. The standard error represents how much the line varies in units associated with either the intercept (row 1) or the slope (row 2).

5. Linear model output: statistic

in both outputs, there is a column labeled "statistic" which combines the least squares estimate with the standard error. the statistic is a standardized estimate, it measures the number of standard errors that the estimate is above zero. as with the estimate and standard error columns, the intercept statistic is given in the first row (15.04) and the slope statistic is given in the second row (2.2).

6. Linear model output: p.value (two-sided)

last, the information for testing whether either the intercept or the slope is zero is given by the p-value in the last column of the output. the default test is two-sided, and it is important to keep in mind that R doesn't know what your research question is. for the model at hand, it is easy to reject the value of zero as a plausible value for the intercept. That is, there is virtually no possible way for data like these to have come from a population with an intercept of zero. The slope, on the other hand, has a significant p-value of 0.03, but the p-value tells us that if there is no relationship between protein and carbs in the population, we would see data like these about 3 percent of the time. If the original research question had been one sided, that is, are protein and carbs positively associated, the p-value should be divided by two to arrive at a one sided p-value of about 0.015. The data are substantially more significant when testing a one-sided hypothesis, although the one sided test should only happen if the original research question is one-sided.

7. Let's practice!

Thanks for following along with this video, now it is your turn to practice!