Link functions-Probit compared to logit

1. Link functions- Probit compared to logit

Welcome back! In this lesson, you will learn about link functions. In the last lesson, you learned about the logit link function and we mentioned the exponential link used with Poison regression in Chapter 1.

2. Why link functions?

Understanding link functions are important for understanding GLMs as well as being able to simulate them. We will do this by comparing the probit link to the logit link function for binomial regression. The main goal of this section is provide you with a concrete example of 2 different link functions. You will also learn how to run probit regression in R.

3. Why probit?

We're looking at a probit because it demonstrates a link function. Some fields prefer it over a logit due to customs and conventions. Likewise, some people prefer it. Personally, I seldom use it.

4. What is a probit?

Likely, you have never heard of probit regression before. I did not until I took a graduate-level statistics course. Probit is short for probability unit. The analysis was first published by Chester Bliss in 1934 for modeling dose-response curves. This method can be computationally easier than a logit, something that was important before modern computing. I have seen this model referred to as probit analysis, probit regression, or the probit model.

5. Probit equation

Like the logistic, the probit assumes a binomial distribution. But, the probit has a different link function, the probit, which is often denoted as inverse-phi. This is a linear equation, like a linear regression or logistic equation.

6. Probit function

The probit function is based upon a cumulative normal distribution, but assumes sigma equals 1. Recall that a normal cumulative distribution takes an input z and gives the probability of observing a value less than or equal to z given a specified distribution. The input variable z for the probit is a z-score that corresponds to a normal distribution.

7. How does a probit compare to a logit?

The logit curve, shown in red, and probit curve, shown in blue, are similar. However, the logit has "fatter tails", that is to say the left and right ends of the curve approach their limits a little bit slower than a probit. This can make the logit better at modeling outliers or rare events.

8. Fitting a probit in R

The family option for glm()s can take multiple types of inputs. The default is characters, which you have seen so far and include quotes. glm()s can also take functions, but notice the lack of quotes. The default for a binomial is a logit. But, this can be changed to probit. When doing DataCamp exercises, please match the instructions so that our solution-testing software will correctly score your answer.

9. Simulate with probit

As a data scientist, sometimes I want to simulate data in order to test models or help design studies. To do this for probit analysis, first I convert a z-score to probability. This is done with the `pnorm()` function. Next, I use the resulting probability with a binomial distribution and simulate 0 or 1 outcomes.

10. Simulate with logit

Simulating with the logit is similar to the probit other than you use the plogis() function rather than pnorm(). First, convert from the logit-scale to probabilities with plogis(). Second, use the probability in a binomial distribution. This is done using the rbinom() function.

11. When to use probit vs logit?

After seeing these examples, you might wonder when you to use a logit compared to when use a probit. The answer is largely domain specific and usually, either model works well. A logit has thicker tails, which makes it slightly better at predicting outlier events. But, either approach is defendable for a data scientist.

12. Let's practice!

Now, let's look at probits and link functions.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.