Visualizing bootstrap samples
In this exercise, you will generate bootstrap samples from the set of annual rainfall data measured at the Sheffield Weather Station in the UK from 1883 to 2015. The data are stored in the NumPy array rainfall
in units of millimeters (mm). By graphically displaying the bootstrap samples with an ECDF, you can get a feel for how bootstrap sampling allows probabilistic descriptions of data.
This exercise is part of the course
Statistical Thinking in Python (Part 2)
Exercise instructions
- Write a
for
loop to acquire50
bootstrap samples of the rainfall data and plot their ECDF.- Use
np.random.choice()
to generate a bootstrap sample from the NumPy arrayrainfall
. Be sure that thesize
of the resampled array islen(rainfall)
. - Use the function
ecdf()
that you wrote in the prequel to this course to generate thex
andy
values for the ECDF of the bootstrap samplebs_sample
. - Plot the ECDF values. Specify
color='gray'
(to make gray dots) andalpha=0.1
(to make them semi-transparent, since we are overlaying so many) in addition to themarker='.'
andlinestyle='none'
keyword arguments.
- Use
- Use
ecdf()
to generatex
andy
values for the ECDF of the original rainfall data available in the arrayrainfall
. - Plot the ECDF values of the original data.
- Hit submit to visualize the samples!
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
for _ in range(50):
# Generate bootstrap sample: bs_sample
bs_sample = ____(____, size=____)
# Compute and plot ECDF from bootstrap sample
x, y = ____
_ = plt.plot(____, ____, ____='.', ____='none',
____='gray', ____=0.1)
# Compute and plot ECDF from original data
x, y = ____
_ = plt.plot(____, ____, ____='.')
# Make margins and label axes
plt.margins(0.02)
_ = plt.xlabel('yearly rainfall (mm)')
_ = plt.ylabel('ECDF')
# Show the plot
plt.show()