1
Exploring Linear Trends
Free
We start the course with an initial exploration of linear relationships, including some motivating examples of how linear models are used, and demonstrations of data visualization methods from matplotlib. We then use descriptive statistics to quantify the shape of our data and use correlation to quantify the strength of linear relationships between two variables.
2
Building Linear Models
Here we look at the parts that go into building a linear model. Using the concept of a Taylor Series, we focus on the parameters slope and intercept, how they define the model, and how to interpret the them in several applied contexts. We apply a variety of python modules to find the model that best fits the data, by computing the optimal values of slope and intercept, using least-squares, numpy, statsmodels, and scikit-learn.
3
Making Model Predictions
Next we will apply models to real data and make predictions. We will explore some of the most common pit-falls and limitations of predictions, and we evaluate and compare models by quantifying and contrasting several measures of goodness-of-fit, including RMSE and R-squared.
4
Estimating Model Parameters
In our final chapter, we introduce concepts from inferential statistics, and use them to explore how maximum likelihood estimation and bootstrap resampling can be used to estimate linear model parameters. We then apply these methods to make probabilistic statements about our confidence in the model parameters.

Variation Around the Trend

The data need not be perfectly linear, and there may be some random variation or "spread" in the measurements, and that does translate into variation of the model parameters. This variation is in the parameter is quantified by "standard error", and interpreted as "uncertainty" in the estimate of the model parameter.

In this exercise, you will use ols from statsmodels to build a model and extract the standard error for each parameter of that model.

Store the preloaded data in a DataFrame df, labeling x_data as times and y_data as distances.
Use model_fit = ols().fit() to fit a linear model or the form formula="distances ~ times" to the data=df.
Extract the estimated intercept model_fit.params['Intercept'] and the standard error of the slope from model_fit.bse['Intercept'].
Repeat for the slope, and then print all 4 with meaningful names.

script.py

IPython Shell