How to build a GLM?

1. How to build a GLM?

Now you know that linear models are not suitable to accommodate different than continuous response data and provide rather strange results. In this lesson, you will learn how to overcome such problems in a quite elegant way with GLMs, which provide a unified framework for modeling data originating from the exponential family of densities which include Gaussian, Binomial, and Poisson, among others.

2. Components of the GLM

There are three components of GLM, the random component which defines the response variable y and its probability distribution. As we saw previously there are different response data types to consider depending on your data problem. One important assumption here is that the observations y_1 through y_n are independent.

3. Components of the GLM

The second component is the systematic component which defines which explanatory variables to include in the model. We can include p different explanatory variables.

4. Components of the GLM

Note that it allows for interaction effects, where one variable depends on another and vice versa,

5. Components of the GLM

or curvilinear effects, etc. Note that the RHS represents a linear combination of the explanatory variables.

6. Components of the GLM

The third and final component is the link function, which connects the random and systematic component. It is the function of the expected value of the response variable which enables linearity in the parameters. By its construction it allows the mean of the response variable to be nonlinearly related to the explanatory variables. It is the link function that generalizes the linear model. Note that the choice of the link function is separate from the choice of random component. Let's review the most common data types and how they are represented in the GLM framework.

7. Continuous $\rightarrow$ Linear Regression

One data type we are all very familiar with is continuous and approximately normally distributed with the real line as the domain. Some of the examples are house prices, level of salary, person's height, etc. When fitting a GLM we would use Gaussian for the distribution family where the link function is the identity. The identity link function is of the simplest form where it equals mu or the mean of the response. As such it specifies the linear regression model where y is assumed continuous and for which we can assume a normal distribution for the response. Therefore, the linear regression is a special case of the GLM.

8. Binary $\rightarrow$ Logistic regression

Another data type we encounter quite often is binary data, i.e. data with two possible classes, which we usually denote as 0/1 where 1 is true and 0 is false. To fit a GLM you should use Binomial distribution where the default link function is the logit. Models of this form are called logistic regression which we will cover in chapters 2 and 4.

9. Count $\rightarrow$ Poisson regression

Count data are positive and some examples include the number of hurricanes, number of bike crossing on a bridge, etc. To fit a GLM you should use Poisson for the distribution where the default link function is logarithm. Models of this form are called Poisson regression which we will cover in detail in chapter 3.

10. Link functions

For reference here is the list of the main link functions and their usage in model fitting.

11. Benefits of GLMs

Let's summarize the concepts introduced in this video. As you learned in this lecture the GLMs unify many different types of the response variable, where the distributions belong to the family of exponential densities. The link function transforms the expected value of y and not y itself, and it enables linear combinations, which further provide benefits that many techniques from linear models apply to GLMs as well. We will further see the details of this in later lessons.

12. Let's practice

Time for some practice problems.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.