Get startedGet started for free

ANOVA and linear models

1. ANOVA and linear models

Beyond statistical tests for two groups, this video will show you how to perform analysis of variance for comparing differences between more than two groups. You will also learn how to perform simple linear regression and learn how to work with the output objects created by these functions.

2. Note on missing values

Suppose you want to analyze the difference scores you computed in an earlier video between measured and self-reported height for the davis dataset to see if there are differences between the bmi categories. There were some missing data for the self-reported heights leading to missing data for the diffht variable, which also leads to values of NA or "not available" for the requested statistics.

3. Note on missing values

To exclude missing data for the statistical functions of mean, standard deviation and variance, you have to add the option na-dot-rm for each statement in the summarise function to remove the NAs. By adding na-dot-rm equals true, the statistics are computed after removing the missing values.

4. Analysis of Variance (ANOVA) SAS and R

In SAS you can run the anova and perform Tukey's post hoc pairwise comparisons in one procedure. However, in R you need two functions aov and TukeyHSD.

5. Analysis of Variance (ANOVA) SAS and R

The aov function performs analysis variance in R.

6. Analysis of Variance (ANOVA) SAS and R

The model is defined using the formula syntax you saw in earlier exercises.

7. Analysis of Variance (ANOVA) SAS and R

The TukeyHSD function performs the post hoc pairwise tests.

8. Analysis of Variance (ANOVA)

You can perform an analysis of variance of diffht by bmi category using the aov function similar to how you ran an t-dot-test previously. The p-value for the ANOVA is shown in the last column of the davisaov summary output as Pr(>F) which is much larger than point-05 indicating no significant differences between bmicat groups. The summary also shows that 17 observations were removed due to missingness. So, unlike the mean and sd functions which need a na-dot-rm equals true option, the AOV function excludes missing values automatically.

9. Post hoc pairwise tests

Besides just reviewing the model output from the aov function, you can use the saved model output as input to other functions like the TukeyHSD function. The TukeyHSD function performs pairwise post hoc tests (using the Tukey honest significance difference method) between the three bmi category groups based on the ANOVA model output. The TukeyHSD functions yield group mean differences, 95 percent confidence intervals for these mean differences and associated p-values for the mean difference tests.

10. Linear regression SAS and R

PROC REG in SAS performs linear regression. In R, you can perform linear regression using the lm function.

11. Linear regression SAS and R

The formula syntax is the same for the lm and aov functions.

12. Simple linear regression

Similar to the aov function, you can run a linear regression model using the lm function. Let's run a simple linear regression of the diffht variable in daviskeep by bmi. The basic model output only shows the fitted coefficients for the intercept and slope. However, the output object actually contains 13 elements.

13. Simple linear regression

You can use the dollar sign to select and display the coefficients specifically and use brackets to select the second coefficient for the slope.

14. Summary of linear regression

To get more detailed reporting for the regression model, the summary of the lm model output can be saved.

15. Summary of linear regression model

The saved output from the lm model summary contains helpful results such as the fit statistics of rsquared and adjusted rsquared.

16. Summary of linear regression model

You can selectively display the fit statistic, r-dot-squared. However, there is also a coefficients element in this model summary output that not only has the intercept and slope terms you saw before but also the standard errors, t-tests and associated p-values.

17. Let's go fit and explore models for abalones!

Let's go fit and explore models for abalones!