1. Logistic regression
Logistic regression is non-linear.
2. Log in logistic
A logistic regression is ideal when the dependent variable is a
binary response in our AB design. For example, if we are interested in
assessing eating the pizza again considering, or ignoring, our topping groups. Note the category column is coded with zero and one rather than strings, with zero indicating the pizza would not be eaten again and one indicating the pizza would be eaten again. With assessing a categorical outcome,
we derive a probability of eating the pizza again.
The predicted dependent variable is given as a function of
the probability of success, or the probability of the pizza being eaten again, divided by the probability of failure, or the probability the pizza will not be eaten again. Since this probability is
assumed linearly related to the independent variable, logistic regression is discussed as a linear regression transformation.
Now, our original equation of y-hat is instead the probability function equal to beta-zero plus beta-one X. In estimating an outcome probability,
our error term is included as a loss function in the log odds, commonly the maximum likelihood, computing multiple values of beta, and choosing the maximized estimate.
3. Logistic regression model
Create the model
using glm, calling dependent variable tilde independent variable. The glm function estimates the logistic regression model, optimizing the regression line. In our pizza case, eat-again tilde enjoyment to ignore grouping. Specify the data frame pizza and set the family to binomial to specify the logistic probability function. View the model table with summary.
The odds of eating the pizza again with zero enjoyment is negative-four-point-two-four, the intercept estimate. The enjoyment estimate indicates that for every unit increase in enjoyment, the odds of eating the pizza again increases by point-nine-six. The z-value assesses whether the parameter is zero. The respective p-value less than point-zero-five indicates the increase in enjoyment impacts the chance of eating the pizza again.
To determine if the model is a good fit,
assess the difference between null and residual deviance.
Subtract the residual deviance, 31-point-one-nine-nine, from the null deviance, 96-point-two-zero-four, to determine the Chi-square value, and subtract the degrees of freedoms for the chi-square degrees of freedom.
In p-chi-sq, use the chi-value as q, degrees of freedom as df, and set lower-dot-tail to FALSE. This gives the p-value for the chi-square goodness of fit.
With our p-value of less than point-zero-five, the model without groups is useful for predicting the probability.
4. Predicting data
Since the model is useful, we can create a data frame with a column for the dependent variable, enjoyment, and a row for each value we will predict, seven. Call the model and new data frame in predict, specifying type as response to denote the predicted probabilities of the log odds are wanted.
The output is the likelihood that the response of one, or yes, will be received irrespective of groups, given the model did not account for the groups.
5. Including groups
Given we have AB groups, we can also include them as an independent variable. Repeat the glm formula, adding Topping to the right side. This model summary table gives a row for the topping, indicating each group is compared to the Cheese pizza. The estimate of this column indicates that eating the Pepperoni pizza versus Cheese pizza changes the log odds of eating the pizza again by three-point-four-seven.
6. Let's practice!
Let's practice logistic regressions.