1. Logistic regression
Logistic regression is a widely used predictive modeling technique. In this video you will learn how logistic regression predicts the target from candidate predictors, and how to use logistic regression in Python.
2. Logistic regression: intuition
Recall from the coding exercises, that elderly people are more likely to donate. Indeed, if we plot the target in function of the age for all donors in the population, we see that a 1 occurs more to the right, where the older donors are. If we fit a regression line through these points, it is of the form a*x+b, with a a positive number. A is called the coefficient of age, and b is called the intercept.
If we plot the target as a function of the time since the last donation for each donor, it can be seen that people who recently donated, are more likely to donate. In this case, the coefficient of recency is negative.
3. Logistic regression: the logit function
The regression line constructed can be used as a predictive model. However, the output is a real number that can be anything.
It would be more convenient to obtain a probability as output, a number between 0 and 1 that expresses how likely it is that someone will donate.
Luckily, we can use the logit function to that end: this function takes the regression formula as input, and calculates a probability from it as shown in this graph. You can see that the output is indeed a number between 0 and 1. This little mathematical trick allows to use linear regression for binary classification problems.
4. Logistic regression in Python
You can build a logistic regression model using the module linear_model from sklearn. First, you create a logistic regression model object using the LogisticRegression function. Next, you need to feed data to the logistic regression model, so that it can be fit. The predictor and target are stored in two separate objects X and Y using indexing. Both X and Y are fed to the fit function that works on the logistic regression model.
After the model is fit, you can observe the coefficient that corresponds with the predictor age, by checking the coef value of the fitted model. In this case, the coefficient is positive, namely 0-point-02, as we expected.
If you want to derive the entire formula from the fitted model, you can also retrieve the intercept by check the intercept_ value, which is about -4.
5. Multivariate logistic regression
Until now, we assumed that there is only one predictor. However, many candidate predictors are available in the basetable. Extending univariate logistic regression to multivariate logistic regression is pretty straightforward: instead of using ax+b, we can add multiple predictors in the formula.
In python, nothing changes, apart from the fact that you now need to select multiple variables in the X object. If you output the coefficients, you see that for each predictor used, a coefficient is calculated.
6. Let's practice!
You should now be ready to construct your first logistic regression model! Let's practice!