Get startedGet started for free

Permutation and bootstrap hypothesis tests

1. Hypothesis tests

You just found that the mean active bout length for mutant fish that have inhibited melatonin production is much longer than for wild type fish that have normal melatonin production.

2. Effects of mutation on activity

This is especially clear if we look at the confidence intervals graphically. Obviously, there is an effect on activity due to mutation of this gene.

3. Genotype definitions

In addition to mutant fish, Prof. Prober's lab also studied heterozygotic fish. These are fish that have one mutated copy of the gene and one functional copy, unlike the mutant, which has two mutated copies, or wild type, which has two functional copies.

4. Effects of mutation on activity

When we do the same analysis of the heterozygote, we see that the effect is much smaller.

5. Effects of mutation on activity

Indeed, if we look at the ECDFs of active bout length, here with the x-axis range adjusted for ease of comparison, we see only a slight difference between the wild type and heterozygotic fish. We have quantified the differences, and we can see them graphically, but now is a good time to test the hypothesis that there is no difference between the heterozygotic and wild type fish.

6. Hypothesis test

A hypothesis test is an assessment of how reasonable the observed data are assuming a hypothesis, called the null hypothesis, is true.

7. p-value

The result of a hypothesis test is a p-value, defined as the probability of obtaining a value of your test statistic that is at least as extreme as what was observed, under the assumption the null hypothesis is true.

8. Test statistic

As a reminder, a test statistic is a single number that can be computed from observed data and from data you simulate under the null hypothesis to serve as a basis of comparison.

9. p-value

The p-value only makes sense if the null hypothesis, test statistic, and the meaning of "at least as extreme as" are clearly defined.

10. Pipeline for hypothesis testing

So, the pipeline for doing a hypothesis test is to clearly state the null hypothesis and the test statistic. Then you *simulate* production of the data as if the null hypothesis were true. For each of these simulated datasets, compute the test statistic. The p-value is then the fraction of your simulated datasets for which the test statistic is at least as extreme as for the real data.

11. Specifying the test

Let's consider now the hypothesis that the active bout lengths of wild type and heterozygotic fish are identically distributed. We will use the difference in means of the active bout lengths as a test statistic, and consider test statistics greater than or equal to what was observed to be "at least as extreme as."

12. Permutation test

The hypothesis says that wild type and heterozygotic fish are completely indistinguishable with respect to their active bout lengths. To simulate this, you can scramble which bout lengths are labeled "wild type" and which are labeled "heterozygote" and compute the test statistic. You do this over and over again to get many permutation replicates. This is called a *permutation test*. You implemented this in the `draw_perm_reps()` function of the `dc_stat_think` module. The first two arguments are the two datasets you are comparing in the hypothesis test. The third argument is a function used to compute the test statistic. You already wrote one to do difference of means, and it is also included in the `dc_stat_think` module. The last argument says how many replicates to generate. Finally, the p-value is computed as the fraction of replicates at least as extreme as what was observed.

13. Let's practice!

Now you can go ahead and practice these techniques with zebrafish active bouts.