Session Ready
Exercise

Looking at a Regression's R-Squared

R-squared measures how closely the data fit the regression line, so the R-squared in a simple regression is related to the correlation between the two variables. In particular, the magnitude of the correlation is the square root of the R-squared and the sign of the correlation is the sign of the regression coefficient.

In this exercise, you will start using the statistical package statsmodels, which performs much of the statistical modeling and testing that is found in R and software packages like SAS and MATLAB.

You will take two series, x and y, compute their correlation, and then regress y on x using the function OLS(y,x) in the statsmodels.api library (note that the dependent, or right-hand side variable y is the first argument). Most linear regressions contain a constant term which is the intercept (the \(\small \alpha\) in the regression \(\small y_t=\alpha + \beta x_t + \epsilon_t\)). To include a constant using the function OLS(), you need to add a column of 1's to the right hand side of the regression.

The module statsmodels.api has been imported for you as sm.

Instructions
100 XP
  • Compute the correlation between x and y using the .corr() method.
  • Run a regression:
    • First convert the Series x to a DataFrame dfx.
    • Add a constant using sm.add_constant(), assigning it to dfx1
    • Regress y on dfx1 using sm.OLS().fit().
  • Print out the results of the regression and compare the R-squared with the correlation.