The logistic distribution

1. The logistic distribution

In order to understand logistic regression, you need to know about the logistic distribution.

2. Gaussian probability density function (PDF)

Before we get to the logistic distribution, let's look at the Gaussian, or normal distribution. Hopefully, you are familiar with the famous "bell curve" of its probability density function, made with scipy's norm dot pdf function. For the purposes of regression, we care more about the area under this curve. By integrating the norm dot pdf function - calculating the area underneath it - we get another curve, known as the cumulative distribution function.

3. Gaussian cumulative distribution function (CDF)

To get the cumulative distribution function, or CDF, you call norm dot cdf instead of pdf. The y-axis is near zero on the far left of the plot, and near one on the far right of the plot. This is a feature of the CDF curve for all distributions. When x has its minimum possible value, in this case minus infinity, y will be zero. When x has its maximum possible value, in this case infinity, y will be one. You can think of the CDF as a transformation from the values of x to probabilities.

4. Gaussian cumulative distribution function (CDF)

When x is one, the CDF curve is at zero-point-eight-four. That means that for a normally distributed variable x, the probability that x is less than one is eighty-four percent.

5. Gaussian inverse CDF

Since the CDF transforms from x-values to probabilities, you also need a way to get back from probabilities to x-values. This is the inverse CDF, also known as percent point function (PPF) or quantile function. Here we have a new dataset with probabilities from nearly zero to nearly one. The inverse CDF is calculated with norm dot ppf. The line plot you see is the same as the CDF plot from the previous slide, but with the x and y axes flipped.

6. Logistic PDF

Here's the logistic probability density function. It looks a little bit like the Gaussian PDF, but the tails at the extreme left and right of the plot are fatter.

7. Logistic distribution

The CDF for the logistic distribution is also known as the logistic function. The two terms are interchangeable. It has a fairly simple equation: one divided by one plus e to the minus x. The inverse CDF is sometimes called the logit function; again, the terms are interchangeable. This may ring a bell: recall from the previous course that logit is also known as the log odds ratio for describing predictions. Its equation is the logarithm of p divided by one minus p. In order to see what these curves look like, you'll have to try the exercises.

8. Let's practice!

Let's have look at those curves!