1. Logistic regression model
Great job on the linear regression exercises! You are getting more and more prepared for your interview! During the interview, you might be tasked with predicting a binary value. To prepare you for that, we'll review logistic regression.
2. Logistic regression's application
Logistic regression models can be used to recognize if a given e-mail is spam or if a given transaction is fraudulent.
3. Binary response variable
Take a look at the plot. The response variable takes only two values: zero and one.
4. Binary response variable
If we fit a linear model on these data, the predictions may take any value, such as 0.5 or 1.2. We know that the response variable is binary, so it makes more sense to estimate the probability that the independent variable takes one of the two values.
5. Logistic function
The probability is modeled with a logistic function of the following form. Take a look at the formula for a second. If the exponentiation increases, the function gets closer to zero. If it decreases, the function converges to one.
6. Logistic function
The logistic function takes only values from the range 0 to 1. And so does the probability.
7. Logistic regression model
A logistic regression model estimates the value of p, which is the probability that y amounts to 1. This formula can also be written in the following form in which we predict a logit. The logit is a logarithm of the odds. The odds of some event reflect the likelihood that the event will take place.
8. Prediction
The logistic regression model returns a probability, a value from 0 to 1.
9. Prediction
To compute a prediction, we need to set a threshold. The rule of thumb says that if the returned value is above 0.5, we predict one. Otherwise, we predict zero.
10. Logistic regression in R
To fit a logistic regression in R, you can use the glm function. You need to specify the formula and the data and set the parameter family to binomial.
11. Logistic regression in R
To predict the values, apply the predict function to the model you've fitted and specify the new data. Set the type parameter to "response" to return the probabilities.
12. Summary
To summarize, we've covered logistic regression model, prediction of a binary response variable, and logistic regression in R with the glm function.
13. Let's practice!
Now that we've reviewed the theory, let's practice!