Get startedGet started for free

p-values

1. p-values

Hypothesis tests are like criminal trials.

2. Criminal trials

Here's a simplified version of how criminal trials work. The true situation is that the defendant committed the crime, or they didn't. There are also two possible outcomes. The judge gives a guilty or a not guilty verdict. The initial assumption is that the defendant is not guilty. It's up to the prosecution team to come up with evidence beyond a reasonable doubt that the defendant committed the crime in order for a guilty verdict to be given.

3. Age of first programming experience

Let's return to the Stack Overflow survey. The age_first_code_cut variable classifies when the user began programming. If they were 14 or older, they are classified as adult; otherwise child. Suppose previous research suggests that thirty five percent of software developers programmed as children. This raises a question answerable with our dataset. Does our sample provide evidence that data scientists have a greater proportion starting programming as a child?

4. Definitions

Let's specify some definitions. A hypothesis is a statement about a population parameter. We don't know the true value of this population parameter; we can only make inferences about it from the data. Hypothesis tests compare two competing hypotheses. These two hypotheses are the null hypothesis, representing the existing idea, and the alternative hypothesis, representing a new idea. They are denoted H-naught and H-A respectively. The null hypothesis is like the current champion, and the alternative hypothesis is like a challenger trying to overthrow that champion. Here, the null hypothesis is that our data won't tell us anything new, and that the proportion of data scientists starting programming as children follows the previous research on software developers, at thirty five percent. The alternative hypothesis is that our hunch is correct, and that the percentage is greater than thirty five.

5. Criminal trials vs. hypothesis testing

Let's compare the criminal trial with the hypothesis test. The defendant committing the crime is equivalent to the alternative hypothesis being true, and the defendant not committing the crime is equivalent to the null hypothesis being true. Rather than saying we accept the alternative hypothesis, the verdicts are rejecting the null hypothesis, or failing to reject the null hypothesis. Initially, we assume that the null hypothesis is true, and this only changes if the sample provides enough evidence to reject it. The hypothesis testing equivalent of "beyond a reasonable doubt" is known as the significance level.

6. One-tailed and two-tailed tests

The tails of a distribution are the left and right edges of its PDF. Hypothesis tests determine whether the sample statistics lie in the tails of the distribution. There are three types of tests, and the phrasing of the alternative hypothesis determines which type you should use. In this case, we need a right-tailed test.

7. p-values

p-values measure the strength of support for the null hypothesis. Small p-values mean our statistic is producing an unlikely result in a tail of our null distribution. p-values are probabilities, so they are always between zero and one.

8. Defining p-values

The definition of a p-value has four parts. It's a probability related to where the statistic from our sample lies on the null distribution. It measures how many values fall farther away than what we observed in the direction of the alternative hypothesis. It is based on the null distribution, which assumes that the null hypothesis is true. The p-value measures evidence. Does it favor the original assumption or the new alternative?

9. Calculating the z-score

To calculate the p-value, we need to run through the same steps as before. We get the sample statistic, in this case the proportion of data scientists who started programming as children. We get the hypothesized value from the null hypothesis, thirty five percent. We get the standard error from the bootstrap distribution. The z-score is the difference between the proportions, divided by the standard error.

10. Calculating the p-value

The last step is new. We pass the z-score to the normal cumulative distribution function, pnorm. Since we want the right-hand tail for this test, we set lower-dot-tail to FALSE. The p-value is three out of one hundred thousand. That's a very small number, but is it small enough to reject the null hypothesis?

11. Let's practice!

That's the goal of the next lesson.