How often do we get no-hitters?
The number of games played between each no-hitter in the modern era (1901-2015) of Major League Baseball is stored in the array nohitter_times
.
If you assume that no-hitters are described as a Poisson process, then the time between no-hitters is Exponentially distributed. As you have seen, the Exponential distribution has a single parameter, which we will call \(\tau\), the typical interval time. The value of the parameter \(\tau\) that makes the exponential distribution best match the data is the mean interval time (where time is in units of number of games) between no-hitters.
Compute the value of this parameter from the data. Then, use np.random.exponential()
to "repeat" the history of Major League Baseball by drawing inter-no-hitter times from an exponential distribution with the \(\tau\) you found and plot the histogram as an approximation to the PDF.
NumPy, pandas, matplotlib.pyplot, and seaborn have been imported for you as np
, pd
, plt
, and sns
, respectively.
This exercise is part of the course
Statistical Thinking in Python (Part 2)
Exercise instructions
- Seed the random number generator with
42
. - Compute the mean time (in units of number of games) between no-hitters.
- Draw 100,000 samples from an Exponential distribution with the parameter you computed from the mean of the inter-no-hitter times.
- Plot the theoretical PDF using
plt.hist()
. Remember to use keyword argumentsbins=50
,normed=True
, andhisttype='step'
. Be sure to label your axes. - Show your plot.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Seed random number generator
____
# Compute mean no-hitter time: tau
tau = ____
# Draw out of an exponential distribution with parameter tau: inter_nohitter_time
inter_nohitter_time = ____(____, 100000)
# Plot the PDF and label axes
_ = ____(inter_nohitter_time,
____, ____, ____)
_ = plt.xlabel('Games between no-hitters')
_ = plt.ylabel('PDF')
# Show the plot
plt.show()