1. Covariance and correlation
Hi, and welcome to the final chapter of the course.
In this lesson, we will review covariance and correlation.
2. Covariance and correlation
Interviewers might be interested in your knowledge of these metrics because they reveal the dependency between two variables.
For example, if a particular material drives up costs of production, the correlation can lead managers to identify substitute supplies that lower manufacturing cost.
3. Linear relationship
Covariance and correlation measure the linear relationship between two variables.
4. Covariance
Covariance is a measure of the joint variability of two random variables.
In the slide, you can review the formula for covariance for a sample and a population.
5. Covariance
Note that in the case of the formula for a sample, you divide by n minus 1,
6. Covariance
whereas in the case of the formula for a population, the division is by n. It's more likely that you will deal with covariance of a sample in your day-to-day work.
7. Covariance - numerical example
Let's quickly go through the calculation of covariance by hand.
You have a sample set of observations of x and corresponding observations of y.
In the first step, you calculate the average of the two samples.
Then, subtract the average value from each of the observations and multiply the corresponding results of the subtractions.
Sum up all the values.
And, finally, divide by the number of observations minus one. The covariance in this example amounts to 7, which implies a positive linear relationship between the two variables.
8. Correlation coefficient
The correlation coefficient is another measure of a linear relationship between two variables. To calculate the correlation coefficient, you need to divide the covariance by the standard deviations.
9. Correlation coefficient
The correlation coefficient is always between -1 and 1.
A value of 1 implies the perfect positive linear relationship.
10. Correlation coefficient
A correlation coefficient of -1 implies the perfect negative linear relationship.
11. Correlation coefficient
A correlation coefficient between 0 and 1 means that there is some positive linear relationship.
12. Correlation coefficient
A coefficient close to zero implies almost no linear relationship.
13. Correlation coefficient
A value between -1 and 0 implies that there is some negative linear relationship.
14. Correlation coefficient
A higher correlation coefficient means that the data is more tightly clustered around a straight line.
15. Nonlinear relationships
Covariance and correlation coefficient are measures of a linear relationship. If the variables are not linearly related, it doesn't mean that they are not related at all! Take a look at these examples. The variables are related, but their correlation is close to zero. The relationship between them is not linear.
16. Correlation does not imply causation!
Remember that correlation does not imply causation. We can observe that two variables change together, but we don't know if the change of one variable causes the change of the other. It's important to get this right in an interview.
17. Summary
In this lesson, we've covered two metrics for linear relationship: covariance, and correlation coefficient.
18. Let's practice!
Now that we've reviewed the theory, let's practice!