Get Started

Bootstrapping regression

Now let's see how bootstrapping works with regression. Bootstrapping helps estimate the uncertainty of non-standard estimators. Consider the \(R^{2}\) statistic associated with a regression. When you run a simple least squares regression, you get a value for \(R^{2}\). But let's see how can we get a 95% CI for \(R^2\).

Examine the DataFrame df with a dependent variable \(y\) and two independent variables \(X1\) and \(X2\) using df.head(). We've already fit this regression with statsmodels (sm) using:

reg_fit = sm.OLS(df['y'], df.iloc[:,1:]).fit()

Examine the result using reg_fit.summary() to find that \(R^{2}=0.3504\). Use bootstrapping to calculate the 95% CI.

This is a part of the course

“Statistical Simulation in Python”

View Course

Exercise instructions

  • Draw a bootstrap sample from the original dataset using the sample() method of a pandas DataFrame. The number of rows should be the same as that of the original DataFrame.
  • Fit a regression similar to reg_fit() using sm.OLS() and extract the \(R^{2}\) statistic using the parameter rsquared.
  • Append the \(R^{2}\) to the list rsquared_boot.
  • Calculate 95% CI for rsquared_boot as r_sq_95_ci using np.percentile().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

rsquared_boot, coefs_boot, sims = [], [], 1000
reg_fit = sm.OLS(df['y'], df.iloc[:,1:]).fit()

# Run 1K iterations
for i in range(sims):
    # First create a bootstrap sample with replacement with n=df.shape[0]
    bootstrap = ____
    # Fit the regression and append the r square to rsquared_boot
    rsquared_boot.append(____(bootstrap['y'],bootstrap.iloc[:,1:]).fit().rsquared)

# Calculate 95% CI on rsquared_boot
r_sq_95_ci = ____
print("R Squared 95% CI = {}".format(r_sq_95_ci))
Edit and Run Code

This exercise is part of the course

Statistical Simulation in Python

IntermediateSkill Level
3.8+
5 reviews

Learn to solve increasingly complex problems using simulations to generate and analyze data.

In this chapter, we will get a brief introduction to resampling methods and their applications. We will get a taste of bootstrap resampling, jackknife resampling, and permutation testing. After completing this chapter, students will be able to start applying simple resampling methods for data analysis.

Exercise 1: Introduction to resampling methodsExercise 2: Sampling with replacementExercise 3: Probability exampleExercise 4: BootstrappingExercise 5: Running a simple bootstrapExercise 6: Non-standard estimatorsExercise 7: Bootstrapping regression
Exercise 8: Jackknife resamplingExercise 9: Basic jackknife estimation - meanExercise 10: Jackknife confidence interval for the medianExercise 11: Permutation testingExercise 12: Generating a single permutationExercise 13: Hypothesis testing - Difference of meansExercise 14: Hypothesis testing - Non-standard statistics

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free