Get startedGet started for free

Two-dimensional smooths and spatial data

1. Two-dimensional smooths and spatial data

Up until now, we have been working with models made up of one or several smooths, each of a single variable. Now, we will expand our models to include smooths of multiple variables and their interactions. This will allow us to look at new kinds of data, especially geospatial data, which are best represented by complex surfaces rather than single smooth lines.

2. Interactions

You may be familiar with the concept of interactions from linear modeling. Interactions in models represent the fact that outcomes depend on non-independent relationships of multiple variables. In a linear model, they are generally represented by adding a term multiplying two variables. This can result in the outcome being higher or lower than what would be predicted by the sum of the two values alone.

3. Interactions in GAMs

In a GAM, the relationship between a variable and an outcome changes across the range of the smooth. Similarly, interactions are different across all the values of two or more variables. We represent interactions between variables as a smooth surface, so any combination of variables can take a different value. This is also a natural way to represent spatial data.

4. Syntax for interactions

The syntax for interactions in GAMs is straightforward. To model the interaction between two variables, we put two variables inside the s() function in a GAM formula, as shown here.

5. Mixing interaction and single terms

You can mix interactions with other terms, which can be linear or nonlinear. For instance, the first formula here has an additional nonlinear term, x3, which is separate from the interaction of terms x1 and x2. The second formula has linear terms x3 and x4. Just as in our previous GAMs, you can include discrete, categorical terms along with interactions and linear terms. A common way to model geospatial data is to use an interaction term of x and y coordinates, along with individual terms for other predictors. The interaction term then accounts for the spatial structure of the data.

6. Interaction model outputs

When you look at the summary outputs of a model with interactions, you'll see that the interaction is a single smooth term. This combines the effects of x1, x2, and their combination in a single smooth. This differs from what you may expect in a linear model, where terms for x1, x2, and their combination are separate. We will discuss how to fit a model that separates these components later in this chapter. Also note the high EDF, that is, effective degrees of freedom for this term. It takes many more basis functions, and therefore more data, to build a two-dimensional surface rather than a one-dimensional line.

7. Spatial data

For exercises involving interactions in GAMs, we will use a new data set called "meuse". This is geospatial data of heavy metal soil pollution along the Meuse river in the Netherlands. It consists of a data frame with x and y coordinates, measures of heavy metals in the soil, and other spatial covariates such as elevation, distance from the river, and the land-use type occurring in that location. For more information on the source and variables of these data, you can look at the help file for this data set in the sp package.

8. Let's practice!

Now let's try some examples of two-dimensional modeling with GAMs.