1. Optimal parameters
After completing the prequel to this course, you are now beginning to think probabilistically. Outcomes of measurements follow probability distributions defined by the story of how the data came to be. When we looked at Michelson's speed of light in air measurements, we assumed that the results were Normally distributed.
2. Histogram of Michelson's measurements
We verified that both by looking at the PDF and
3. CDF of Michelson's measurements
the CDF, which was more effective because there is no binning bias. To compute and plot the CDF, we needed our old friends
4. Checking Normality of Michelson data
NumPy and matplotlib dot pyplot, so the first step was to import them with their traditional aliases. To compute the theoretical CDF by sampling, we passed two parameters into np dot random dot normal, the mean and standard deviation. The values we chose for these parameters were in fact the mean and standard deviation we calculated directly from the data.
5. CDF of Michelson's measurements
The result was that the theoretical CDF overlayed beautifully with the empirical CDF. How did we know that the mean and standard deviation calculated from the data were the appropriate values for the Normal parameters? We could have chosen others.
6. CDF with bad estimate of st. dev.
What if the standard deviation differs by 50%? The CDFs no longer match. Or if the mean
7. CDF with bad estimate of mean
varies by just point-01%. So, if we believe that the process that generates our data gives Normally distributed results,
8. Optimal parameters
the set of parameters that brings the model, in this case a Normal distribution, in closest agreement with the data uses the mean and standard deviation computed directly from the data. These are the optimal parameters. Remember though, the parameters are only optimal for
9. Mass of MA large mouth bass
the model you chose for your data. When your model is wrong, the optimal parameters are not really meaningful. Finding the optimal parameters is not always as easy as just computing the mean and standard deviation from the data. We will encounter this later in this chapter when we do linear regressions and we rely on built-in NumPy functions to find the optimal parameters for us. I pause to note that
10. Packages to do statistical inference
there are great tools in the Python ecosystem for doing statistical inference, including by optimization, scipy dot stats and
11. Packages to do statistical inference
statsmodels being two good examples. In this course, however,
12. Packages to do statistical inference
we focus on hacker statistics because the technique is like a Swiss Army knife; the same simple principle is applicable to a wide variety of statistical problems.
13. Let's practice!
Now it's time for you to do some exercises to demonstrate how choosing optimal parameters results in best agreement between the theoretical model distribution and your data.