Effect size

1. Effect size

We've now learned about various types of hypothesis tests. We'll now learn how to measure how big of an effect one variable has on another.

2. What is effect size?

Effect size is a measure of the strength of relationship between two variables. For example, it is well known that smoking has a very strong relationship with cancer. Therefore, the effect size between the number of years smoked and the probability of getting cancer is very high. On the other hand, something like a poor diet may also be related to cancer, but with a much smaller strength. While a poor diet may contribute to cancer risk, it has a much, much smaller effect size than smoking.

3. Why measure effect size

Measuring effect size is important because we are often interested in understanding not just if two variables are related, but how strongly they are related. In the previous example, perhaps both smoking and diet have an effect on the chance of getting cancer. But clearly smoking has a much larger effect!

4. Effect size versus p-values

This leads to an important concept, namely, the relationship between effect size and p-value. The p-value measures if there is a relationship at all. The effect size measures the strength of that relationship. This illustrates why it's important to consider both factors when making inference.

5. Effect size for means - Cohen's d

A common metric to measure standardized effect size between two sample means is Cohen's d. To calculate Cohen's d we need to know both the sample size and standard deviation of our two samples. Next we calculate the pooled standard deviation s, given by the formula shown. Finally, Cohen's d is defined as the difference in the sample means divided by the pooled standard deviation. In the exercises we'll perform these calculations.

6. Interpreting Cohen's d

Cohen's d can be interpreted using the displayed rules of thumb. So for example, if we calculated a value of zero-point-six, that would indicate a medium-to-large effect size.

7. Effect size for correlation

Next we'll consider correlations. We've actually already seen effect size for correlation. If we calculate Pearson's R and then square it we get R-squared, which is an effect size for correlation. Recall that this value is the percent of the variation in one variable that is explained by knowing the other. In our case we see that the closing price of the S-and-P 500 has a very strong effect on the closing price of Bitcoin. Knowing one explains eighty two percent of the variation in the other!

8. Effect size for categorical variables

Finally, effect size for categorical variables can be calculated using Cramer's V. We need the chi-squared test statistic from our data, as well as the total number of data points and degrees of freedom. Cramer's V is then defined by the equation shown.

9. Calculating Cramer's V

If we have a contingency table comparing gender and job title of employees, we can compute the chi-squared test statistic using the chi2_contingency function. Then we can compute the degrees of freedom and the total number of observations to compute Cramer's V.

10. Interpreting Cramer's V

Cramer's V always returns a value between zero and one, with zero indicating no relationship and one indicating a perfect association. The interpretation uses both the degrees of freedom and the value of Cramer's V. Here we see a table summarizing the effect size based on both values. For example, our value of zero-point-five-two with a degrees of freedom of one corresponds to a large effect size for gender on job title.

11. Let's practice!

Now that we've seen the ideas, let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.