Exercise

# Looking at a Regression's R-Squared

R-squared measures how closely the data fit the regression line, so the R-squared in a simple regression is related to the correlation between the two variables. In particular, the magnitude of the correlation is the square root of the R-squared and the sign of the correlation is the sign of the regression coefficient.

In this exercise, you will start using the statistical package `statsmodels`

, which performs much of the statistical modeling and testing that is found in R and software packages like SAS and MATLAB.

You will take two series, `x`

and `y`

, compute their correlation, and then regress `y`

on `x`

using the function `OLS(y,x)`

in the `statsmodels.api`

library (note that the dependent, or right-hand side variable `y`

is the first argument). Most linear regressions contain a constant term which is the intercept (the \(\small \alpha\) in the regression \(\small y_t=\alpha + \beta x_t + \epsilon_t\)). To include a constant using the function `OLS()`

, you need to add a column of 1's to the right hand side of the regression.

The module `statsmodels.api`

has been imported for you as `sm`

.

Instructions

**100 XP**

- Compute the correlation between
`x`

and`y`

using the`.corr()`

method. - Run a regression:
- First convert the Series
`x`

to a DataFrame`dfx`

. - Add a constant using
`sm.add_constant()`

, assigning it to`dfx1`

- Regress
`y`

on`dfx1`

using`sm.OLS().fit()`

.

- First convert the Series
- Print out the results of the regression and compare the R-squared with the correlation.