Standard Error

1. Standard Error

Previously, you have built models, and interpreted the model parameters, slope and intercept, in context of real data. Notice that our two uses of models have been (1) computing predictions and (2) extracting parameter values, slope and intercept. You have computed quantitative measures of variation and "goodness" of the model *predictions*, but what about the variation or errors in the model *parameters*? How accurate are the model PARAMETERS, are there variations in those parameters, and how much of the variation is due to deterministic trends versus inherent randomness? In this lesson, instead of using a single value like RMSE that summarizes the entire model prediction, we will compute the standard error of each of the model parameters separately.

2. Uncertainty in Predictions

Previously, you've seen The "best" model does not usually fit the data perfectly. The predictions of the model, even when the model fits the data, may have a very large spread of residuals. RMSE measures that spread. The spread of a distribution is directly related to the measure of "uncertainty", and the inverse is often used as a measure of "precision". A distribution with a large spread is considered more uncertain, and less precise. A small spread is considered less uncertain, and more precise. In this way, RMSE is a measure of prediction uncertainty.

3. Uncertainty in Parameters

In a similarly way, standard error is a measure of the uncertainty in the model parameter values computed in the least-squares process. We now think of those "optimal" parameter values as not THE one true answer, but the "best estimate". For example, if you have distance versus time data, you maybe more interested in the model parameters, because one of them, the slope, is the car's speed. The value you compute for model slope, when doing a least-squares fit, must be thought of as the center or "mean" of a DISTRIBUTION of parameter values, in this case, the slope or "speed". Standard error is a measure of the spread of that distribution, and in this case is interpreted as the "uncertainty in the speed estimate".

4. Computing Standard Errors

Computing standard error "by hand" is NOT easy, and indeed, often not possible. We'll spend much of the last Chapter of this course using statistical inference in part to help illustrate a computational approach. For now, we can use `statsmodels` to easily compute numerical values of standard error. Let's start where we left off the last time we used `statsmodels` to build and fit a model and extract model parameters. We started with a Pandas DataFrame, passed it into the `ols()` method from `statsmodels` and called `.fit()` When you execute `fit()`, the model parameter vales are computed by `statsmodels` and are then available in the `params` container within the `model_fit` object. The intercept is accessed from model_fit.params using dictionary style key indexing with the key named 'Intercept' The slope is accessed from `model_fit.params with the key named 'times', matching the column name from the input DataFrame.

5. Computing Standard Errors

But there's more! `statsmodels` has also computed the standard error for both the slope and intercept. The standard error of the intercept is accessed using the key 'Intercept' to index into model_fit.bse The standard error of the slope is accessed using the key 'times' to index into model_fit.bse

6. Let's practice!

To truly understand these uncertainties in our models, we must move past thinking of them as "errors" and begin thinking of them as probability distributions. Although we'll address this more directly in the next chapter, keep this in mind as you practice computing these "standard errors".

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.