Introduction to statistical seismology and the Parkfield experiment

1. Introduction to statistical seismology

As someone who is motivated to learn data science skills, you know that statistical inference is a crucial tool for learning about almost anything quantifiable. In this case study, you will use statistics to learn about the earth beneath you as we study earthquakes from a statistical perspective.

2. California moves and shakes

My home state of California is one of the most seismically active regions in the world. Here is a map of the southern two-thirds of the state, where most of the population is. Shown in red are faults in the Earth's crust that have slipped in recent history. The very long one is the famous San Andreas fault.

3. The Parkfield region

The Parkfield region, marked by the black box, is a portion of the San Andreas fault of particular interest to seismologists.

4. The Parkfield region

Let's take a look at it. On this map, each blue dot represents the location of the epicenter of an earthquake of at least magnitude 4 that happened since 1950. You will spend this chapter taking a statistical approach to study these earthquakes. At the center is the tiny town of Parkfield, which as a population of about 20

5. The Parkfield region

and a cafe that has this in front of it. <pause with a smile> Before you take your hacker stats skills to the earthquake capitol of the world, I need to bring you up to speed on some seismology.

6. Seismic Japan

As an illustrative example, we will consider earthquakes of magnitude five or higher that happened in the 1990s in Japan. To characterize the magnitudes of these earthquakes, we can plot the magnitudes as an ECDF.

7. ECDF of magnitudes, Japan, 1990-1999

Now, when we eyeball the ECDF, the magnitudes look to be Exponentially distributed, with one exception. Up until now, every time you encountered an Exponential distribution, the lowest value was close to zero. Here, it is five. The magnitude five, in this case, is a location parameter.

8. Location parameters

A location parameter defines the shift of a distribution function along the x-axis. Equivalently, defining *m* as the magnitude, we could say that the green ECDF is for the variable m-prime = m minus five. In the context of statistical seismology, the location parameter is called the completeness threshold, denoted m-sub-t. The reason for this name will become clear in a moment. Whether or not we have a nonzero location parameter, the distribution is still Exponential.

9. The Gutenberg-Richter Law

This, in fact, seems to be a general feature of earthquakes, and has a name: the Gutenberg-Richter Law. It states that the magnitudes of earthquakes in a given region over a given time period are Exponentially distributed. This is quite convenient for characterizing earthquake magnitudes because the Exponential distribution has a single parameter, the optimal value of which is given by the completeness threshold subtracted mean of the measured earthquake magnitudes.

10. The b-value

For historical reasons, seismologists do not use the mean directly, but the mean times the natural logarithm of ten. The result is a measure of seismicity, called the b-value. We can compute it for Japan in the 1990s, and we get a value of about one. Indeed, most seismically active regions have a b-value right around one.

11. ECDF of all magnitudes

Let's now look at the ECDF of all detected Japanese earthquakes in the 1990s. For low magnitude earthquakes, we see a strong departure from Exponentiality.

12. ECDF of all magnitudes

This is called *roll-off*, and is due to the fact that lower-magnitude earthquakes are difficult to detect. Now you see where the name "completeness threshold" comes from.

13. Completeness threshold

It is the magnitude above which we can measure all earthquakes.

14. Let's practice!

Now enjoy your foray into statistical seismology of the Parkfield region!