Get startedGet started for free

Correlations and t-tests

1. Correlations and t-tests

Before developing models, it is crucial to explore relationships in your data. This video will teach you how to visualize and test bivariate relationships between the variables.

2. Correlations compare SAS and R

Correlation measures the strength of the association between two variables. Like PROC CORR in SAS, the corr-dot-test function from the psych package computes Pearson's correlations between numerical variables. This function also computes confidence intervals and p-values from testing each pairwise correlation.

3. Code for correlation analyses

In both SAS and R, correlations can be computed between multiple variables at once.

4. Correlations with psych package

In this first example, you will run the corr-dot-test function from the psych package to get a 3 by 3 correlation matrix between bmi, weight and height in the daviskeep dataset. The corr-dot-test function also provides the p-values for each correlation, which are all really small, which is why the output only shows zeros. In an upcoming video you will learn how to display the p-values with greater precision.

5. Scatterplot matrix SAS and R

You can make scatterplot matrices to visualize the association between pairs of variables using the ggpairs function from the GGally package similar to the matrix plots option for SAS's PROC CORR.

6. Code for scatterplot matrix SAS and R

The code shown here in SAS and R will create all three of the possible bivariate scatterplots between bmi, weight, and height.

7. Scatterplot matrix - GGally::ggpairs() function

The ggpairs plot from the GGally package also includes the correlations and density plots.

8. Scatterplot matrix - ggpairs by group

You can add sex to select and assign ggpairs color aesthetic to sex to get correlations, boxplots and histograms by sex.

9. Descriptive stats by group

When comparing two groups, it is good to review the means and standard deviations by group using summarise and group_by. However, you can also get the group sample sizes by adding the n function after across within the summarise function.

10. T-tests SAS and R

Independent sample t-tests are used to compare means between two groups. The output from the t-test procedure in SAS includes the equal variance test plus the results of both pooled t-tests for equal group variances and unpooled t-tests for unequal group variances. However, in R the test for equal variances must be run separately using the var-dot-test function before running the t-dot-test function.

11. Code for t-tests SAS and R

In SAS the group variable is designated by the CLASS statement and the continuous variable is designated by the VAR statement.

12. Code for t-tests SAS and R

However, in R, a common way to state the relationship to be tested is to use formula syntax indicated by the tilde operator. The formula, bmi tilde sex, highlighted here indicates that bmi is modeled by sex.

13. Code for t-tests SAS and R

Also, the default setting for the t-test function is for an unpooled t-test where the option var-dot-equal is FALSE by default. To get a pooled t-test, var-dot-qual has to be set equal to TRUE.

14. T-tests - check for equal variances

You need to test the equal variances assumption to know whether to run a pooled or unpooled t-test for bmi by sex. While the p-value is significant here, it is also useful to look at the ratio of variances which is close to point-5 indicating that one variance is approximately twice as large as the other variance, which is not too large. Either pooled or unpooled tests should be fine.

15. T-tests - pooled and unpooled

The unpooled t-test output is shown here. To run the pooled t-test, you set var equal to TRUE. Both tests indicate that the bmi for males in group M is significantly higher than women in group F indicated by very small p-values.

16. Let's explore bivariate relationships in abalones!

Let's explore bivariate relationships in abalones!