Get startedGet started for free

Linear regression

1. Linear regression

Linear regression assesses straight regression lines.

2. Least squares

After data collection and assessing group differences, and commonly after running correlations, regressions in AB designs can assess the line of best fit either ignoring or accounting for groups. Least-squares, common in linear regression, minimizes the residual sum of squares of the regression line. Each residual is squared, then summed. The line creating the smallest sum of squares value is identified. The error in a linear regression is generally the mean square error, or the sum of squares divided by the number of data points. The model is fit, minimizing the sum of squares, then evaluated using mean square.

3. Linear regression model

We can fit a linear regression model of our time to eat pizza and enjoyment of pizza, ignoring groups, using y tilde x, or time to eat pizza tilde enjoyment. To see the model table, call the model in summary. The intercept estimate gives beta-zero and enjoyment estimate gives beta-one, the slope. The t-value gives the standard deviations away from zero the estimate is, and the next column gives the p-value of the t-value. Assess the model strength with R-squared, the variation explained by the model or how well the observed outputs are generated, ranging from zero to one. A higher R-squared indicates better model fit. The R-squared of point-zero-nine indicates a nine-percent variance in time explained by enjoyment. Adjusted R-squared is preferred as it adjusts for the number of variables. The F-statistic notes the relationship between the independent and dependent variables.

4. Assessing assumptions

When the relationship of residuals between continuous independent and dependent variables are linear, a linear regression can be used. The observations should be independent with no dependency on one another, such as time series data which is dependent on the prior data point. Homoscedasticity should also be assessed, indicating residuals have consistent variance over the predicted value. We can check this using plot of the fitted model on x and residuals of the model on y. The spread of residuals should have consistent variance across the plot. The residuals should also be relatively normally distributed. Create a qqplot of the residuals using qqnorm. If the values fall roughly in a straight line at a 45-degree angle, indicated in red with qqline, the data is normally distributed.

5. Making predictions

Given our model is acceptable, we can use it to predict data. To determine the time it will take to eat the pizza if they give an enjoyment rating of 12, store the value as Enjoy, the original variable name, store it in a data frame, then call the model and data frame in predict. With an enjoyment rating of 12, a subject will likely eat the Cheese pizza in six-point-two-four time. Multiple data can be predicted at once by including the points in Enjoy with c.

6. Including groups

We can run a multiple regression model, including the AB groups. If there is a reason to believe the groups may impact the dependent variable, it should be included. Add the groups Topping to the right side of the formula using plus. This summary output includes a row for the group variable, indicating Cheese is used as the comparison group, particularly important for over two groups, as each group is assessed in relation to the comparison group. Again, we can predict but must specify Enjoy and Topping to call in a data frame and use in predict with the appropriate multiple linear model.

7. Let's practice!

Let's practice.