Get startedGet started for free

Geospatial data

1. Geospatial Data

Welcome to the course on geospatial data! In this course you will discover how to integrate geospatial data in your Python workflow for data science. I am Joris Van den Bossche, and my co-instructor is Dani Arribas-Bel. But, before we dive into the fascinating world of manipulating and analysing spatial data, let us pause for a second and define *what* in particular is geospatial data.

2. What is Geospatial Data?

Geospatial data are data for which a specific location is associated with each record. First of all, it is data. A lot of the operations we will be doing with geospatial data are very similar to those we would do with non-spatial data.

3. What is Geospatial Data?

But with geospatial data, every observation has a location and can be "put on a map". This allows us to look at spatial relationships between the data. The real power of Geospatial Data however is the ability to combine both, the data themselves, and their location, unlocking several opportunities for sophisticated analysis.

4. Spatial data I

Spatial data comes in all shapes and sizes. A typical example of traditional geospatial data are governmental census data. Here, we see a picture of the population density in the United states.

5. Spatial data II

But, nowadays, there is an increasing availability of new sources of spatial data. For example, here, Dani tracked a bike ride with his smartphone.

6. How we record the real world

In Geographic Information Sciences, there are two data models for how we record the world. The first model is a raster, which encodes the world as a continuous surface represented by a grid, such as the pixels of an image. Prominent examples include altitude data or satellite images. The other model is to represent the world as a collection of discrete objects using points, lines and polygons. This is called vector data.

7. Raster versus vector data

Here is a real world example of the two data models of the same area. On the left, you see a thermal satellite image showing the heat loss of buildings. On the right, you see a visualization of vector data of the same area: discrete features where buildings are represented as polygons and roads as lines. In this course, we will focus on vector data, so let's take a deeper look into it.

8. Vector features

Vector features are made up of three different types of geometries: To start, we have a point geometry: a single location with X and Y coordinates.

9. Vector features

Next, a line is a group of connected points. In the code, you will notice that this is called a linestring.

10. Vector features

Finally, a polygon is formed by a closed line that encircles an area. Additionally, one feature can also consist of multiple geometries, such as a MultiPolygon.

11. Vector example I

Let's give a real-world example illustrating those types of vector data: we can represent the countries of the world as polygons, shown here on this figure.

12. Vector example II

Now we add the locations of megacities as point features.

13. Vector example III

Finally, we add some of the largest rivers of the world as lines.

14. Vector attribute data

A last important concept of this first video are the feature attributes. Typically, we will have information about our vector features. Using the country polygons as example, we could have information about the name of the country, its capital, population number, etc. When we have a collection of such features, for example all the countries in the world, combined with its attributes, we end up with a table. This is the kind of data that will be used in this course.

15. Let's practice!

For the exercises, we assume that you have a basic knowledge of the pandas package to work with tabular data, and matplotlib for visualization. Let's do some first exercises using those packages.