Get startedGet started for free

Limitations of linear models

1. Limitations of linear models

Hello, I'm Richard Erickson. I'm a data scientist. Welcome to this DataCamp course on generalized linear models in R. During this course, you'll expand your regression toolbox by learning about generalized linear models or GLMs for short.

2. Course overview

In chapter 1, you'll see a review of linear models, learn about their limitations, and see how GLMs overcome some of these limitations. You'll also learn about Poisson regression, a type of GLM. In chapter 2, you'll learn how to run binomial regressions. In chapter 3, you'll learn about interpreting and plotting GLMs. In Chapter 4, you'll learn how to do multiple regression with GLMs. Now, let's see how GLMs compare to other models you likely know.

3. Workhorse of data science

Linear models are a workhorse in data science and include common tools such as linear regression, ANOVAs, and t-tests. They are used for everything from sports-analytics to chemistry. Personally, I use them daily to analyze data.

4. Linear models

LMs seek to explain variability by estimating coefficients for predictor variables. Intercepts model "average" or baseline effects of predictors. Slopes model changes caused by continuous predictors. The equation to predict y has an intercept beta-naught, slope beta1, x, and error epsilon.

5. Linear models in R

You can fit linear models in R with the lm() function. Notice how linear models take the formula as the 1st input and data as the second. The tilde may be read as "predicted by", for example, y is predicted by x.

6. Assumption of linearity

Linear models have important assumptions. By definition, the model examines linear relationships, such as the top example plot.

7. Assumption of normality

Additionally, linear models assume residuals are normally distributed, such as the right plot.

8. Assumption of continuous variables

Furthermore, linear models work best with continuous response variables, such as the data on the left. However, in real life, many datasets do not meet these assumptions.

9. Chick weights

The ChickWeight dataset works well with linear models. The dataset compares the weights of chicks fed four diets through time. I've grabbed the last observed weights and plotted the chick weights at the end of the study. In the next slide, you'll see how to fit a linear model to this data.

10. Chick diets impact on weight

The datasets package contains the ChickWeights dataset. I've saved the last observations as ChickWeightEnd. We will see if the diet explains weight. Specifically, a linear model can examine if the 2nd, 3rd, and 4th diets differ from the 1st diet. Using the linear model function, we use the formula "Weight is predicted by diet" with our ChickWeightEnd data. This model estimates a global intercept (Intercept), which corresponds to the average weight of a chick receiving the 1st diet, as well as the differences for the other diets. In this case,diets 2, 3, and 4 had chicks with higher weights than diet 1 by the amounts shown on the screen.

11. What about survivorship or counts?

However, what about other end points? For example, what if we have survival or count data? Survival is binary and counts are discrete. Neither of these are continuous, hence linear models are not a good choice! We need a new tool: The generalized linear model!

12. Generalized linear model

Linear models can be extended or generalized to become generalized linear models or "GLMs" for short. Specifically, GLMs can have non-normal errors or distributions, although there are limitations to the possible distributions. For example, we can use a Poisson family for count data, which we will see later in this chapter. Or, we can use a binomial family for binary data such as survival data, which we will see in chapter 2. GLMs also have non-linear link functions, which links the regression coefficients to the distribution and allows the linear model to be generalize. We will explore link functions in chapter 2.

13. GLMs in R

GLMs are fit with the function glm(). Like lm()s, glm()s have formulas and data as inputs, but also have a family input. The Gaussian family is how R refers to the normal distribution and is the default for a glm(). Also, if the family is gaussian, then a glm() is the same as a lm().

14. Let's practice!!

Now that you've seen a little bit about GLMs, let's use them with some data!