How often do we get no-hitters?

The number of games played between each no-hitter in the modern era (1901-2015) of Major League Baseball is stored in the array nohitter_times.

If you assume that no-hitters are described as a Poisson process, then the time between no-hitters is Exponentially distributed. As you have seen, the Exponential distribution has a single parameter, which we will call $τ$ , the typical interval time. The value of the parameter $τ$ that makes the exponential distribution best match the data is the mean interval time (where time is in units of number of games) between no-hitters.

Compute the value of this parameter from the data. Then, use np.random.exponential() to "repeat" the history of Major League Baseball by drawing inter-no-hitter times from an exponential distribution with the $τ$ you found and plot the histogram as an approximation to the PDF.

NumPy, pandas, matplotlib.pyplot, and seaborn have been imported for you as np, pd, plt, and sns, respectively.

Seed the random number generator with 42.
Compute the mean time (in units of number of games) between no-hitters.
Draw 100,000 samples from an Exponential distribution with the parameter you computed from the mean of the inter-no-hitter times.
Plot the theoretical PDF using plt.hist(). Remember to use keyword arguments bins=50, normed=True, and histtype='step'. Be sure to label your axes.
Show your plot.

script.py

IPython Shell

Parameter estimation by optimization

Bootstrap confidence intervals

Introduction to hypothesis testing

Hypothesis test examples

Putting it all together: a case study

Exercise

Exercise

How often do we get no-hitters?

Instructions