Get startedGet started for free

Looking at a Regression's R-Squared

R-squared measures how closely the data fit the regression line, so the R-squared in a simple regression is related to the correlation between the two variables. In particular, the magnitude of the correlation is the square root of the R-squared and the sign of the correlation is the sign of the regression coefficient.

In this exercise, you will start using the statistical package statsmodels, which performs much of the statistical modeling and testing that is found in R and software packages like SAS and MATLAB.

You will take two series, x and y, compute their correlation, and then regress y on x using the function OLS(y,x) in the statsmodels.api library (note that the dependent, or right-hand side variable y is the first argument). Most linear regressions contain a constant term which is the intercept (the \(\small \alpha\) in the regression \(\small y_t=\alpha + \beta x_t + \epsilon_t\)). To include a constant using the function OLS(), you need to add a column of 1's to the right hand side of the regression.

The module statsmodels.api has been imported for you as sm.

This exercise is part of the course

Time Series Analysis in Python

View Course

Exercise instructions

  • Compute the correlation between x and y using the .corr() method.
  • Run a regression:
    • First convert the Series x to a DataFrame dfx.
    • Add a constant using sm.add_constant(), assigning it to dfx1
    • Regress y on dfx1 using sm.OLS().fit().
  • Print out the results of the regression and compare the R-squared with the correlation.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the statsmodels module
import statsmodels.api as sm

# Compute correlation of x and y
correlation = ___
print("The correlation between x and y is %4.2f" %(correlation))

# Convert the Series x to a DataFrame and name the column x
dfx = pd.DataFrame(x, columns=['x'])

# Add a constant to the DataFrame dfx
dfx1 = sm.add_constant(___)

# Regress y on dfx1
result = sm.OLS(___, ___).fit()

# Print out the results and look at the relationship between R-squared and the correlation above
print(result.summary())
Edit and Run Code