Correlations
1. Correlations
Let's talk about correlations.2. Correlations in survey analysis
In statistics, a correlation is defined as an interdependence of variable quantities. It is when there is a connection between two or more things. When one correlated variable changes, so does the other. In survey analysis, correlation measures the linear relationship between two survey items. When these variables change together, they covary. However, it is important to note that this covariation isn't necessarily due to a direct or indirect causal link. Correlation can only indicate the strength of the statistical relationship between two survey items. It can't tell us whether one of those items is influencing the other item. One reason is because there may be a third variable that affects both variables being studied, making them seem causally related, when they aren't. Also, it is impossible to conclude which variable causes changes in the other.3. Correlation strength and direction
Correlation coefficients, represented by r, are numeric values for correlations. They range from negative one to positive one, with the absolute value of negative one or positive one equal to a perfect relationship between variables, and zero signifying no relationship between variables. Also, r values less than zero represent a negative relationship between variables, and numbers greater than zero represent a positive relationship. The smaller the amount of data points included in the correlation calculation, the stronger correlations have to be in order to be considered statistically significant.4. .corr() function
The corr function is used on specific columns, or variables, to calculate the correlation, or r value. On survey data, it takes in the first column of the DataFrame that we want to analyze, followed by the dot corr function, and within the parenthesis, the second column name we want to analyze. Let's see this with a real survey.5. .corr() example: healthy_city
The Healthy Lifestyle Cities report surveys and analyzes international cities to find out which ones promote a healthy lifestyle. Each city is evaluated according to 10 different parameters including the levels of obesity, happiness levels, and life expectancy by country. Here, we see a sample of the 2021 survey.6. .corr() example: healthy_city
Quickly plotting the relationship between life expectancy and happiness levels indicates a possible linear relationship.7. .corr() example: healthy_city
To find the strength of that linear relationship, let's first select the happiness levels column, followed by the dot-corr() function, and within the parentheses, include the life expectancy column, as shown. Our r-value is positive, which tells us that an increase in happiness levels is related to an increase in life expectancy. The value zero-point-seven-two-four tells us that although this isn't a perfect relationship, the correlation between these two variables is quite strong.8. Let's practice!
We've learned some pretty interesting concepts. Let's test what we know so far!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.