1. Model Estimation and Likelihood
Observed data is often treated as a sample taken from a larger population. We want to model the population, but we only have the sample.
In this lesson, you will use estimation to build models of population distributions from sample statistics.
2. Estimation
We start with a visual example.
Imagine we measured the distance traveled by a satellite each hour for a week, shown here as grey bars.
We then build a model of the distribution of distances traveled each hour for an entire year (our population), shown as the red curve.
To build the model, we assumed:
(1) the population model is shaped like a Gaussian.
(2) the sample statistics are good "ESTIMATES" for the population model parameters.
3. Estimation
Let's see it in code.
First, we define a gaussian model. The mu and sigma parameters define the center and spread, respectively.
Second, we compute sample statistics, the sample mean and sample standard deviation.
Lastly, we use the sample `mean` and `stdev` as "ESTIMATES" of the population parameters, mu and sigma.
But why? What is the "likelihood" that this specific model best "predicts" our given data?
4. Likelihood vs Probability
To help answer this question, we give names to some useful concepts.
A "CONDITIONAL" probability is stated as a question: what is the probability that A occurs, "given the condition" that B has already occurred.
In the context of data and models, there is a naming convention for conditionals.
If the MODEL is given, we ask what is the **PROBABILITY** that it outputs any particular data point.
If the DATA is given, we ask what is the **LIKELIHOOD** that a candidate model COULD output the particular data we have.
5. Computing Likelihood
If we had two candidate models, we'd want to chose the one that has the greatest likelihood to output the given data.
But how do we compute likelihood?
Here we start with a gaussian model (in red) with specific values for its parameters, mu and sigma.
Now, with the CONDITION this model is given, we ask what is the **PROBABILITY** that this model would output one particular data point?
For a single distance, shown as the vertical blue line, the probability the model outputs that data is shown as the horizontal green line.
6. Computing Likelihood
Repeat this process for every distance in the sample and take the product.
If the model peak is centered over most of the sample data, the product of probabilities will be large.
If the sample distances are far away from the model peak, out under the wings of the model, the likelihood that the model could produce that data set is small.
7. Likelihood from Probabilities
In code, we start by using the sample statistics as guesses for the population model parameters mu and sigma.
Next, for each sample point, compute the probability by passing the distance, and guesses for mu and sigma into the model.
Then, we take the product of all those probabilities to compute likelihood. Finally, it's useful to take the log of the likelihood as it has better numerical properties.
8. Maximum Likelihood Estimation
Now we repeat the process for an **array** of models.
Here we try 101 guesses, centered on the sample mean.
For each iteration, we compute one loglikelihood, resulting in an array of 101 loglikelihoods.
Finally, we find the one guess that gives the maximum loglikelihood.
9. Maximum Likelihood Estimation
Here we plot the 101 likelihood values, one for each guess of mu.
The best estimator for the population mu is the guess that gives the maximum loglikelihood.
When the model is gaussian, this mean matches the answer we'd get from least-squares.
10. Let's practice!
Now it's your turn to estimate parameters using the maximum likelihood procedure.