Get startedGet started for free

Going beyond linear regression

1. Going beyond linear regression

Hi, my name is Ita and I welcome you to this course on generalized linear models or GLMs for short. GLMs provide a versatile framework for statistical modeling of data and are often used to solve practical problems. We will examine several such problems.

2. Course objectives

The main objectives of this course are learning the building blocks of GLMs, how to train them, interpret the model results, assess performance and compute predictions. To accomplish the objectives we will set the theoretical and computational basis in chapter 1, and cover logistic and Poisson regression in remaining chapters. By the end of the course, you will have both a theoretical understanding and the working knowledge of GLMs.

3. Review of linear models

GLMs are a generalization of linear models. To understand this suppose you would like to predict salary given years of experience. In regression terms you would write it as salary is predicted by experience where tilde means "predicted by". More formally, our linear model would be written as follows,

4. Review of linear models

where y is the continuous response variable,

5. Review of linear models

x the explanatory variable,

6. Review of linear models

betas are fixed, unknown parameters that we estimate, where Beta_0 denotes the intercept and beta_1 is the slope.

7. Review of linear models

and the random error term epsilon which measures how much of the variation in the response is not explained by the explanatory variable.

8. ols() and glm()

To fit linear models in Python we use statsmodels ols function, which is imported from statsmodels dot formula dot api. Next, we initialize ols with formula and data arguments. Formula specifies output, inputs and data containing the variables. Finally, the model is fitted by calling the fit method. The glm function is considerably similar. It is also imported directly and it uses one additional argument, family, which denotes the probability distribution of the response variable. More on this in the next lessons.

9. Assumptions of linear models

Using the ols function we obtain the linear fit. The regression function tells us how much the response variable y changes, on average, for a unit increase in x. The model assumptions are linearity in the parameters, the errors are independent, normally distributed and the variance around the regression line is constant for all values of x.

10. What if ... ?

But what if the response is not continuous but binary or count? or the variance depends on the mean? Can we still fit a linear model?

11. Dataset - nesting of horseshoe crabs

To illustrate this let's consider data from nesting horseshoe crabs. The data has four explanatory variables and response variables sat and y.

12. Linear model and binary response

We are interested in predicting the probability that there is at least one satellite crab nearby the female crab given female's weight.

13. Linear model and binary response

The response variable is binary, denoting Yes or 1 if the satellite is present and No or 0 otherwise.

14. Linear model and binary response

First we fit a linear model using ols function.

15. Linear model and binary response

Taking the weight at 5.2 and reading of the probability value, we see the fit is structurally wrong since we get a value greater than 1, which is not possible by our data.

16. Linear model and binary data

To correct for this we fit a GLM, shown in blue, with the Binomial family corresponding to Binomial or logistic regression. Visually, there is a significant difference in the fitted models. Let's see what this means numerically.

17. Linear model and binary data

Now for the weight of 5.2 we obtain probability of 0.99, which is in line with binary data, since it is bounded by 0 and 1.

18. From probabilities to classes

To obtain the binary class from computed probabilities, we split the probabilities at say 0.5, which for the weight of 5.2, gives the Yes class. Similarly, for weight 1.5 we obtain No class.

19. Let's practice!

Now let's review the concepts in exercises.