Get startedGet started for free

Introduction to spatial data

1. Introduction to spatial data

Welcome to Working with Geospatial Data in R. Perhaps your first question is: What makes makes data spatial?

2. What is spatial data?

It's quite simple, whenever our data is associated with **locations**, we think about it as spatial. When those locations are on the Earth we'll often be a little more specific and call the data, geospatial. A location is most unambiguously described by a set of coordinates and a description of what coordinate system is being used. You are already familiar with one coordinate reference system: geographical coordinates, the usual latitude-longitude pairs we use to specify where something is on the globe.

3. House sales in Corvallis

In this chapter you'll be working with house sales from 2015, where I live, a town called Corvallis. When a house sells, the most obvious data

4. House sales in Corvallis

is the price and the address of the house, but there may also be data on how big the house was, how many bedrooms it has, how many bathrooms, how much land, and what condition it is in. All this data is associated with a house at a specific location so we can consider this spatial data.

5. House sales in Corvallis

Although the address specifies the location, it isn't a set of coordinates. To work with these house sales we need to convert the address

6. House sales in Corvallis

to a set of coordinates

7. House sales in Corvallis

latitude specifies the position North to South, and longitude the position East to West.

8. House sales in Corvallis

Often the North/South and East/West abbreviations are dropped and instead we use positive numbers for one direction: North for latitude and East for longitude and negative numbers for the other direction: South for latitude and West for longitude. In math we are so used to specifying location in x and y, that you'll also often see longitude listed first, since it specifies the horizontal- or x-coordinate.

9. House sales in a data frame

Let's take a look at these house sales as you'll see them in a data frame in R. Each row is a sale. There are latitude and longitude columns that describe the location of the sale, and then all the other columns describe other attributes of that sale: the sale price, the number of bedrooms etc. When the data is associated with point locations, that is a single set of spatial coordinates, we describe it as point data. You'll see some other types of spatial data in the last section of this chapter.

10. Displaying spatial data with ggplot2

When spatial data is stored in a data frame, you already have tools to display it. For example, using ggplot2 we can plot all the house sale locations by mapping the x position to longitude, and the y position to latitude. This doesn't look like much of a map, because it is missing spatial cues, like terrain and roads, that help orient us when we are looking at spatial data.

11. The ggmap package

In this chapter we'll use the ggmap package to add these spatial cues to our plots. The ggmap package downloads maps from web services and adds them as a layer in ggplot2 plots. Here's an example of getting a map of New York City. We know the location in latitude and longitude. We use the get_map() function to download the map, passing in the location and a zoom parameter. Then we pass the result to the ggmap() function which displays the map.

12. Let's practice!

Over the next few exercises you'll see how we can combine the ggmap function with our original ggplot2 map to add some context to our map of house sales. But, first you'll play with ggmap to learn where Corvallis is.