Bootstrap and Standard Error
Imagine a National Park where park rangers hike each day as part of maintaining the park trails. They don't always take the same path, but they do record their final distance and time. We'd like to build a statistical model of the variations in daily distance traveled from a limited sample of data from one ranger.
Your goal is to use bootstrap resampling, computing one mean for each resample, to create a distribution of means, and then compute standard error as a way to quantify the "uncertainty" in the sample statistic as an estimator for the population statistic.
Use the preloaded sample_data
array of 500 independent measurements of distance traveled. For now, we use a simulated data set to simplify this lesson. Later, we'll see more realistic data.
This is a part of the course
“Introduction to Linear Modeling in Python”
Exercise instructions
Assign the
sample_data
as the model for the population.Iterate
num_resamples
times:- Use
np.random.choice()
each time to generate abootstrap_sample
ofsize=resample_size
taken from thepopulation_model
and specifyreplace=True
. - Compute and store the sample mean each time.
- Use
Compute and print the
np.mean()
andnp.std()
ofbootstrap_means
.Use the predefined
plot_data_hist()
and visualize thebootstrap_means
distribution.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use the sample_data as a model for the population
population_model = ____
# Resample the population_model 100 times, computing the mean each sample
for nr in range(num_resamples):
bootstrap_sample = np.random.____(population_model, size=____, replace=____)
bootstrap_means[nr] = np.____(bootstrap_sample)
# Compute and print the mean, stdev of the resample distribution of means
distribution_mean = np.mean(____)
standard_error = np.std(____)
print('Bootstrap Distribution: center={:0.1f}, spread={:0.1f}'.format(____, ____))
# Plot the bootstrap resample distribution of means
fig = plot_data_hist(____)
This exercise is part of the course
Introduction to Linear Modeling in Python
Explore the concepts and applications of linear models with python and build models to describe, predict, and extract insight from data patterns.
In our final chapter, we introduce concepts from inferential statistics, and use them to explore how maximum likelihood estimation and bootstrap resampling can be used to estimate linear model parameters. We then apply these methods to make probabilistic statements about our confidence in the model parameters.
Exercise 1: Inferential Statistics ConceptsExercise 2: Sample Statistics versus PopulationExercise 3: Variation in Sample StatisticsExercise 4: Visualizing Variation of a StatisticExercise 5: Model Estimation and LikelihoodExercise 6: Estimation of Population ParametersExercise 7: Maximizing Likelihood, Part 1Exercise 8: Maximizing Likelihood, Part 2Exercise 9: Model Uncertainty and Sample DistributionsExercise 10: Bootstrap and Standard ErrorExercise 11: Estimating Speed and ConfidenceExercise 12: Visualize the BootstrapExercise 13: Model Errors and RandomnessExercise 14: Test Statistics and Effect SizeExercise 15: Null HypothesisExercise 16: Visualizing Test StatisticsExercise 17: Visualizing the P-ValueExercise 18: Course ConclusionWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.