Bootstrapping regression
Now let's see how bootstrapping works with regression. Bootstrapping helps estimate the uncertainty of non-standard estimators. Consider the \(R^{2}\) statistic associated with a regression. When you run a simple least squares regression, you get a value for \(R^{2}\). But let's see how can we get a 95% CI for \(R^2\).
Examine the DataFrame df with a dependent variable \(y\) and two independent variables \(X1\) and \(X2\) using df.head(). We've already fit this regression with statsmodels (sm) using:
reg_fit = sm.OLS(df['y'], df.iloc[:,1:]).fit()
Examine the result using reg_fit.summary() to find that \(R^{2}=0.3504\). Use bootstrapping to calculate the 95% CI.
Diese Übung ist Teil des Kurses
Statistical Simulation in Python
Anleitung zur Übung
- Draw a bootstrap sample from the original dataset using the
sample()method of a pandas DataFrame. The number of rows should be the same as that of the original DataFrame. - Fit a regression similar to
reg_fit()usingsm.OLS()and extract the \(R^{2}\) statistic using the parameterrsquared. - Append the \(R^{2}\) to the list
rsquared_boot. - Calculate 95% CI for
rsquared_bootasr_sq_95_ciusingnp.percentile().
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
rsquared_boot, coefs_boot, sims = [], [], 1000
reg_fit = sm.OLS(df['y'], df.iloc[:,1:]).fit()
# Run 1K iterations
for i in range(sims):
# First create a bootstrap sample with replacement with n=df.shape[0]
bootstrap = ____
# Fit the regression and append the r square to rsquared_boot
rsquared_boot.append(____(bootstrap['y'],bootstrap.iloc[:,1:]).fit().rsquared)
# Calculate 95% CI on rsquared_boot
r_sq_95_ci = ____
print("R Squared 95% CI = {}".format(r_sq_95_ci))