Are the Belmont Stakes results Normally distributed?
Since 1926, the Belmont Stakes is a 1.5 mile-long race of 3-year old thoroughbred horses. Secretariat ran the fastest Belmont Stakes in history in 1973. While that was the fastest year, 1970 was the slowest because of unusually wet and sloppy conditions. With these two outliers removed from the data set, compute the mean and standard deviation of the Belmont winners' times. Sample out of a Normal distribution with this mean and standard deviation using the rng.normal()
function and plot a CDF. Overlay the ECDF from the winning Belmont times. Are these close to Normally distributed?
Note: Justin scraped the data concerning the Belmont Stakes from the Belmont Wikipedia page.
This exercise is part of the course
Statistical Thinking in Python (Part 1)
Exercise instructions
- Compute mean and standard deviation of Belmont winners' times with the two outliers removed. The NumPy array
belmont_no_outliers
has these data. - Take 10,000 samples out of a normal distribution with this mean and standard deviation using
rng.normal()
. - Compute the CDF of the theoretical samples and the ECDF of the Belmont winners' data, assigning the results to
x_theor, y_theor
andx, y
, respectively. - Hit submit to plot the CDF of your samples with the ECDF, label your axes and show the plot.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Compute mean and standard deviation: mu, sigma
# Sample out of a normal distribution with this mu and sigma: samples
# Get the CDF of the samples and of the data
# Plot the CDFs and show the plot
_ = plt.plot(x_theor, y_theor)
_ = plt.plot(x, y, marker='.', linestyle='none')
_ = plt.xlabel('Belmont winning time (sec.)')
_ = plt.ylabel('CDF')
plt.show()