Get startedGet started for free

Introduction to correlations

1. Introduction to correlations

Correlations investigate associations of variables.

2. Correlation in A/B design

A correlation assesses the strength and direction of the relationship between two variables, determining the amount of increase or decrease in one variable per unit increase or decrease in the other variable. We hypothesize that enjoyment is correlated with the time to eat pizza. Given AB test designs having two groups, such as Cheese and Pepperoni topping pizza, each potentially having multiple measures, such as time to eat pizza and enjoyment of pizza, multiple correlations can be run on AB designs. Groups can be ignored, assessing the correlation of time and enjoyment of pizza in general.

3. Correlation in A/B design

We can also test within groups, assessing the time and enjoyment in Cheese pizza only and the correlation in Pepperoni pizza only. In AB design, groups are generally of interest individually. Obtaining different correlations in each group can make the impact of the group on the relationship of the variables to be apparent.

4. Correlation

Remember correlation does not imply causation. The number of drownings and ice cream sales are significantly correlated, both increasing together. The likely explanation is ice cream is eaten, and drownings occur more often during warmer months. AB tests can help us deduce causation by making and testing changes. Comparing pepperoni to cheese pizza, the only group difference being topping, can identify pepperoni as the likely cause of a difference in correlation in each group. We are assessing enjoyment on time to eat pizza and think enjoyment may cause eating time. If the enjoyment and time relationship is different in each group, we infer the relationship is caused by the topping. Meaningfully changing AB tests provides insights to the variables. A correlation also does not determine whether the variables are dependent. The number of drownings is not dependent on number of ice cream sales, or vice versa. The data can be viewed using ggplot-two, calling the dataset in ggplot, variables in x and y, wrapped in aes, and geom-underscore-point to create a scatter plot.

5. Correlation coefficient

The correlation coefficient, r, measures the degree or strength of association, ranging from negative-one to positive-one. A negative correlation indicates one variable increases as the other decreases. A positive correlation means as one variable increases, the other variable also increases. Zero indicates no correlation. A further coefficient from zero indicates a better fit and greater correlation. Of particular interest to many AB design studies is the ability to predict data, which is based on correlations. For example, in a stronger correlation, the enjoyment of pizza can be better used to predict the time to eat the pizza, or vice versa.

6. Correlation values

The cor function gives the correlation coefficient. A correlation coefficient over point-seven, such as this one, is generally considered strong. In addition to the strength of the relationship, the proportion of variation in the dependent variable, x, here time to eat, that can be attributed to the independent variable, y, or enjoyment, can be found with R-squared or squaring the correlation coefficient saved as corvalue.

7. Correlation limitations

Though outliers should be assessed in any test, correlations are particularly susceptible to distortion from outliers. Note that the correlation coefficient does not give the slope of the line of best fit, shown here in red. This line is derived in regression analyses, built on correlations. Additionally, the correlation coefficient itself is not an indication of statistical significance but is used, along with the sample size, to determine a p-value and whether the null hypothesis can be rejected.

8. Let's practice!

Let's practice this.