1. Model Optimization
Earlier in the course, we saw a conceptual overview of the steps in building a model.
We started with the general form of a linear model, and then **visually** adjusted the specific values of the parameters, to get the "best" fit.
In this lesson, we'll see a **quantitative** method for finding the optimal parameter values that result in the one model that fits the data better than all others of the same form.
We will express the "fitting" in terms of an optimization problem, and show how optimization minimizes the errors expressed in the form of a "cost function" of the residuals.
2. Residuals
Recall that a Taylor Series, such as a "first order" linear model, is always an APPROXIMATION.
But we can quantify the difference between the model and data.
It is called a "residual" and an example is shown here, again with the sea level data.
The top panel shows the data as black dots, and the model as a red line.
The second panel below, shows the residuals; that is, the difference between the model array and the data array, as green dots.
3. Residuals Summed
Notice that in this plot, the model chosen has an intercept equal to the mean of y.
To quantify the overall difference, we want to sum the residuals, but in this case the positive and negative residuals when summed cancel out to zero.
4. Residuals Squared
So instead of residuals, we use the square residuals.
Here we see on the plot, the blue squares representing the square of the residuals.
This has two advantages.
first, they do not sum to zero.
second, they penalize larger residuals disproportionately more than smaller residuals.
This is a good feature when trying to find a quantity or "cost function" to constrain our optimization of model parameters.
5. RSS
So, the single aggregate quantity we chose to guide how we optimize the model, will be the sum of the squared residuals.
This is called "R S S"
We will use this to find the "optimal" model parameters a0 and a1.
6. RSS
Notice here, we picked better values for a0 and a1, a0=5, a1=0.08, and RSS went from 65 down to 6.
This is the game: vary the model parameters, a0 and a1, until the measure of model fit, RSS, is smallest. In this way, minimization of RSS has given us the "optimal" values for our model parameters.
7. Variation of RSS
Plotted here is an example of another "optimization" problem.
We take many values of slope a1. And for each a1, we build a model and compute a single RSS value.
Then plot RSS, on the y-axis, versus a1, on the x-axis, to get the curve shown.
Notice that the each value of RSS does NOT matter, only that we find the minimum value.
The minimum value of RSS is at the bottom of the upturned curve.
It gives us the minimal residuals overall, and so the best model. So we use the corresponding value for a1, in this case about 25, as our final value to use to build our model.
8. Let's practice!
Later we will see that variables can have an innate randomness that no Taylor series can model, no matter how many nonlinear terms we keep.
So we must accept that the model will never be precisely the same as the data.
For now, let's practice working with RSS and optimization problems to find the best linear models.