Logistic regression
1. Logistic regression
This is our final lesson of the course -- we're going to work with the logistic regression model. Suppose a university has provided you with data on students' test scores and the hours of study they put in before the test. The logistic model will allow you to classify and predict based on the hours of study how likely it is that a student will pass the test. Let's get started!2. Original data
A plot of a sample of the original data looks like this. Now, for this first exercise, let's say that the data provided is not the actual scores but, based on the hours of study, whether a student passed or failed the test.3. New data
So, for each student you have the hours of study and only two possible values, pass or fail.4. Where would you draw the line?
In this case, where would you draw the line to classify between pass or fail based on the hours of study? You could put it at 10 or 11 or even 12, but depending on where you draw the line you will have some misclassified values.5. Solution based on probability
To draw the line, we need a function that will provide probabilities based on the hours of study. So, the model will provide the probability, which is a value between 0 and 1 for each value of hours of study. This means we have to change our scale.6. The logistic function
The function we need is the logistic function, also called sigmoid. This function: Will throw values from 0 to 1 Will get values based on a linear model using the slope and intercept So, we pass it a linear model and the logistic function returns probabilities.7. Changing the slope
From now on we will call the slope of the linear model beta1 and the intercept beta0. If we study the effect of the parameters we can see that increasing beta1 (the slope) will make the logistic function steeper, or more aggressive to classify at a certain value of x.8. Changing the intercept
On the other hand, adjusting the parameter beta0 (intercept), will translate the function left or right on the x-axis. This is how you draw the line.9. From data to probability
So, for each value we will get the probability of passing the test based on the hours of study applying the logistic model parameters.10. Outcomes
We can say that if the probability is higher than 0.5 we will consider that the outcome is a pass, and otherwise it's a fail. You can see the predicted outcomes of the model in red.11. Misclassifications
But if we compare the model's predictions with the actual outcomes, we can see that there are some misclassifications between 11 and 12 hours of study. Based on the model, we can say that if a student studies less than 10 hours the probability of them passing the test is very low, and if they study 13 hours or more they have a high probability of passing the test. Now let's code a bit.12. Logistic regression
To run a logistic regression model we will use scikit-learn -- in particular, the LogisticRegression class. We create our model with LogisticRegression and pass the C parameter as 1e9. This parameter helps keep the model from overfitting to the data. Then we call model.fit with our data. We create variables to get the parameters from model dot coef_ and model dot intercept_. The parameters from the model are arrays, so we extract the values from the arrays. Finally, we print the values.13. Predicting outcomes based on hours of study
If we want to predict the outcome based on a provided number of hours of study, we pass the hours of study to model dot predict and we get the predicted outcome. Notice that the outcomes are an array, so we can pass many values to test and we'll get an array with the outcome for each value provided.14. Probability calculation
If you instead are curious about the probability of passing with a particular number of hours of study, you can use model dot predict_proba. You pass it an array with the values you want to calculate. For 9 hours, we have approximately 0.05 probability of passing.15. Let's practice!
It's been great working with the logistic model -- now let's practice some more.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.