Exercise

Bootstrapping regression

Now let's see how bootstrapping works with regression. Bootstrapping helps estimate the uncertainty of non-standard estimators. Consider the \(R^{2}\) statistic associated with a regression. When you run a simple least squares regression, you get a value for \(R^{2}\). But let's see how can we get a 95% CI for \(R^2\).

Examine the DataFrame df with a dependent variable \(y\) and two independent variables \(X1\) and \(X2\) using df.head(). We've already fit this regression with statsmodels (sm) using:

reg_fit = sm.OLS(df['y'], df.iloc[:,1:]).fit()

Examine the result using reg_fit.summary() to find that \(R^{2}=0.3504\). Use bootstrapping to calculate the 95% CI.

Instructions

100 XP
  • Draw a bootstrap sample from the original dataset using the sample() method of a pandas DataFrame. The number of rows should be the same as that of the original DataFrame.
  • Fit a regression similar to reg_fit() using sm.OLS() and extract the \(R^{2}\) statistic using the parameter rsquared.
  • Append the \(R^{2}\) to the list rsquared_boot.
  • Calculate 95% CI for rsquared_boot as r_sq_95_ci using np.percentile().