Timing of major earthquakes and the Parkfield sequence

1. Timing of major earthquakes

In the last set of exercises, you looked at *the power* of earthquakes. Now, you will look the *timing* of earthquakes.

2. Models for earthquake timing

While the study of earthquake timing has produced many models, there are two main textbook models, which are the two we will study here. First, there is the Exponential model, which assumes that earthquakes happen like a Poisson process. This means that when the last earthquake happened has no bearing on when the next will happen. Then there is the Gaussian, or Normal, model. Under this model, earthquakes happen periodically.

3. Stable continental region earthquakes

As an example, let's look at the timing of earthquakes that happen away from faults in the so-called stable continental regions around the world. Here is the ECDF of the time between these quakes. I have omitted inter-earthquake times less than two weeks so as not to count aftershocks. The time between earthquakes is clearly Exponentially distributed, with the theoretical distribution overlayed in green. This makes sense; occurrence of earthquakes around the world are probably not correlated. The picture is not so clear, though, if we restrict ourselves to predicting the earthquakes we are more interested in: very powerful earthquakes along faults.

4. The Nankai Trough

For example, let's consider the magnitude eight-plus megathrust earthquakes that have happened along the Nankai Trough off the coast of Honshu in Japan.

5. Earthquakes in the Nankai Trough

In looking at the dates of the earthquakes, they seem to happen roughly every 200 years.

6. ECDF of time between Nankai quakes

Plotting the ECDF unfortunately does not immediately add all that much clarity. There simply are not many data points. It is worth pausing to think about our representation of ECDFs. With these few data points, it is kind of hard to see the shape of the ECDF plotted as points.

7. Formal ECDFs

Formally, the ECDF at *x* is defined as the fraction of data points that are less than or equal to *x*. This is defined at all positions along the *x* axis, not just those corresponding to measured data points.

8. Formal ECDFs

So, formally, we should plot the ECDF like this.

9. Formal ECDFs

The correspondence between the two representations is clear with the dot representation you're used to overlayed in red.

10. Formal ECDFs

The function `dcst.ecdf()` will allow you to make plots of these formal ECDFs via the `formal` keyword argument. It is a matter of opinion as to how you display your ECDFs. Either way has the same information. My personal preference is for dots for any dataset with more than 20 data points because the values of the measurements are very clear. But formal ECDFs are also always ok. Now, what if you wanted to compare this ECDF to theoretical CDFs of the models for earthquake occurrence?

11. Generating theoretical distributions

You first compute the mean and standard deviation from the data, since these will be the best estimates for the parameters of the theoretical distributions. To get the theoretical CDFs, you can use NumPy's `random` module to draw lots of samples out of the theoretical distributions, in this case Exponential and Normal. You can then use these to plot the theoretical CDFs.

12. Model for Nankai Trough

Performing those operations gives this result. The timing of the Nankai megathrust earthquakes seems to follow the Gaussian model more closely than the Exponential.

13. Let's practice!

Now it is your turn to take a look at the sequence of big earthquakes around Parkfield in recent years. Recent geologically speaking that is!

This exercise is part of the course

Case Studies in Statistical Thinking

IntermediateSkill Level

5.0+

Start Course for Free

To begin, you'll use two data sets from Caltech researchers to rehash the key points of Statistical Thinking I and II to prepare you for the following case studies!

Exercise 1: Activity of zebrafish and melatonin Exercise 2: EDA: Plot ECDFs of active bout length Exercise 3: Interpreting ECDFs and the story Exercise 4: Bootstrap confidence intervals Exercise 5: Parameter estimation: active bout length Exercise 6: Permutation and bootstrap hypothesis tests Exercise 7: Permutation test: wild type versus heterozygote Exercise 8: Bootstrap hypothesis test Exercise 9: Linear regressions and pairs bootstrap Exercise 10: Assessing the growth rate Exercise 11: Plotting the growth curve

In this chapter, you will practice your EDA, parameter estimation, and hypothesis testing skills on the results of the 2015 FINA World Swimming Championships.

Exercise 1: Introduction to swimming data Exercise 2: Graphical EDA of men's 200 free heats Exercise 3: 200 m free time with confidence interval Exercise 4: Do swimmers go faster in the finals?Exercise 5: EDA: finals versus semifinals Exercise 6: Parameter estimates of difference between finals and semifinals Exercise 7: How to do the permutation test Exercise 8: Generating permutation samples Exercise 9: Hypothesis test: Do women swim the same way in semis and finals?Exercise 10: How does the performance of swimmers decline over long events?Exercise 11: EDA: Plot all your data Exercise 12: Linear regression of average split time Exercise 13: Hypothesis test: are they slowing down?

Some swimmers said that they felt it was easier to swim in one direction versus another in the 2013 World Championships. Some analysts have posited that there was a swirling current in the pool. In this chapter, you'll investigate this claim! References - <a href="https://qz.com/761280/researchers-believe-certain-lanes-in-the-olympic-pool-may-have-given-some-swimmers-an-advantage/" target="_blank">Quartz Media</a>, <a href="https://www.washingtonpost.com/news/wonk/wp/2016/09/01/these-charts-clearly-show-how-some-olympic-swimmers-may-have-gotten-an-unfair-advantage/?utm_term=.dba907006ba1" target="_blank">Washington Post</a>, <a href="https://swimswam.com/rio-olympic-test-event-showed-same-pool-bias-2-0/" target="_blank">SwimSwam</a> (and also <a href="https://swimswam.com/problem-rio-pool/" target="_blank">here)</a>, and <a href="https://www.ncbi.nlm.nih.gov/pubmed/25003776" target="_blank">Cornett, et al</a>.

Exercise 1: Introduction to the current controversy Exercise 2: A metric for improvement Exercise 3: ECDF of improvement from low to high lanes Exercise 4: Estimation of mean improvement Exercise 5: How should we test the hypothesis?Exercise 6: Hypothesis test: Does lane assignment affect performance?Exercise 7: Did the 2015 event have this problem?Exercise 8: The zigzag effect Exercise 9: Which splits should we consider?Exercise 10: EDA: mean differences between odd and even splits Exercise 11: How does the current effect depend on lane position?Exercise 12: Hypothesis test: can this be by chance?Exercise 13: Recap of swimming analysis

Herein, you'll use your statistical thinking skills to study the frequency and magnitudes of earthquakes. Along the way, you'll learn some basic statistical seismology, including the Gutenberg-Richter law. This exercise exposes two key ideas about data science: 1) As a data scientist, you wander into all sorts of domain specific analyses, which is very exciting. You constantly get to learn. 2) You are sometimes faced with limited data, which is also the case for many of these earthquake studies. You can still make good progress!

Exercise 1: Introduction to statistical seismology and the Parkfield experiment Exercise 2: Parkfield earthquake magnitudes Exercise 3: Computing the b-value Exercise 4: The b-value for Parkfield Exercise 5: Timing of major earthquakes and the Parkfield sequence

Current Exercise

Exercise 6: Interearthquake time estimates for Parkfield Exercise 7: When will the next big Parkfield quake be?Exercise 8: How are the Parkfield interearthquake times distributed?Exercise 9: Computing the value of a formal ECDF Exercise 10: Computing the K-S statistic Exercise 11: Drawing K-S replicates Exercise 12: The K-S test for Exponentiality

Of course, earthquakes have a big impact on society, and recently are connected to human activity. In this final chapter, you'll investigate the effect that increased injection of saline wastewater due to oil mining in Oklahoma has had on the seismicity of the region.

Exercise 1: Variations in earthquake frequency and seismicity Exercise 2: EDA: Plotting earthquakes over time Exercise 3: Estimates of the mean interearthquake times Exercise 4: Hypothesis test: did earthquake frequency change?Exercise 5: How to display your analysis Exercise 6: Earthquake magnitudes in Oklahoma Exercise 7: EDA: Comparing magnitudes before and after 2010 Exercise 8: Quantification of the b-values Exercise 9: How should we do a hypothesis test on differences of the b-value?Exercise 10: Hypothesis test: are the b-values different?Exercise 11: What can you conclude from this analysis?Exercise 12: Closing comments