How are the Parkfield interearthquake times distributed?

1. How are the Parkfield interearthquake times distributed?

Knowing how the time between major earthquakes is distributed makes a big difference for assessing when the next earthquake will strike. It turns out that the Parkfield sequence has been central in the science of earthquake prediction.

2. The Parkfield Prediction

In the mid-80s, seismologists predicted that the next Parkfield quake would occur in 1988, and almost certainly no later than 1993. They performed a linear regression as a basis for their prediction, which is essentially assuming a Gaussian model. But the earthquake did not come in 1988, nor in 1993. It came in late 2004. In light of this, you will work out whether we can dismiss the Exponential model, presumably favoring the Gaussian model, in the exercises. For illustration now, we will look at the Nankai Trough earthquakes in the context of the Gaussian model.

3. Hypothesis test on the Nankai megathrust earthquakes

We will test the hypothesis that the time between Nankai megathrust earthquakes are Normally distributed, parametrized with a mean and standard deviation calculated from the observed earthquakes. We are left to specify the test statistic and what it means to be at least as extreme as.

4. The Kolmogorov-Smirnov statistic

What is a reasonable test statistic to measure how close an ECDF is to a theoretical CDF, in this case a Normal CDF?

5. The Kolmogorov-Smirnov statistic

Above, I plot the distance of the ECDF from the theoretical Normal CDF. We might take as our test statistic the maximum of these distances. This seems like a reasonable measure of the distance between the empirical CDF and that of the distribution we are testing against.

6. The Kolmogorov-Smirnov statistic

The maximal distance occurs around 150 years and has a value of about 0.2. This maximal distance has a name: the Kolmogorov-Smirnov statistic, or K-S statistic for short. But how do we compute it without having to make graphs like this?

7. The Kolmogorov-Smirnov statistic

It helps to look at where the local maximal distances occur. In every case, the local maximum is at a corner of the formal ECDF.

8. The Kolmogorov-Smirnov statistic

Note that the corner can either a concave corner at the top of a step, or at a convex corner at the base of a step. You will use these ideas to write a function to compute the Komogorov-Smirnov statistic in the exercises.

9. Kolmogorov-Smirnov test

So, now that we have our test statistic, which is always positive, it is clear that "at least as extreme as" means that the simulated K-S statistic is greater than or equal to the observed K-S statistic. The hypothesis test we just defined is called the Kolmogorov-Smirnov test. We are now left to figure out how to simulate acquiring the data under the null hypothesis.

10. Simulating the null hypothesis

Taking a hacker stats approach, we first generate the theoretical CDF by drawing many, like ten thousand, samples and storing them. Now, say we have *n* data points. For the Nankai dataset, *n* = 8. Then, to generate each Kolmogorov-Smirnov replicate, we draw *n* samples from the theoretical distribution. We then compute the K-S statistic using these *n* samples and the ten thousand samples we drew out of the theoretical distribution. Here is the technique in code. We can use functions in NumPy's `random` module to make the samples. The key part of the test, then, in computing the K-S statistic. You will write the `ks_stat()` function to do this in the exercises. Incidentally, the p-value for the hypothesis than the Nankai Trough earthquakes follow the Gaussian model is close to 0.9, so the data are commensurate with that model.

11. Simulating the null hypothesis

12. Let's practice!

Now it's time for you to compute p-values for the Parkfield sequence to see how it jibes with the Exponential model.