1. Exploring our data
By now you've seen that graph data rarely comes in a form that we can work with right "out of the box". In this chapter we'll be working with a dataset from the Chicago Divvy bike sharing network. To start with we'll look at a subset of data that they make freely available. We'll cover how we go from raw data to an igraph object and the decisions about calculating graph properties like edge weights.
2. Bike data frame
Let's start by looking at the fields in the data set. Each row is an individual trip. It has to and from station id, name, and lat / lon. There's also some metadata about each trip including how long the trip was in seconds, usertype, gender, age, and distance in meters between stations. All we'll need to create our graph is the to and from station ids, but all that metadata will be useful in later lessons because it will allow us to compare graphs of different groups of bike network users. Let's start with creating our basic graph.
3. Creating the bike sharing graph
The first thing we'll do is group our data by from and to station id's. Then we need to consider what we want to use for edge weights. There are a variety things we could use. There's trip duration, so we could do average trip time, or percent of user type. Another intuitive feature to use as an edge weight is the number of trips between stations. That's the feature we'll be using and can calculate it using the n() function in dplyr.
4. Creating the bike sharing graph
We'll use the graph_from_data_frame() function to create a graph based on the from and to station id's. Next we'll add the edge weight parameters and quickly size our graph. We can see that it's got approximately 19, 000 edges and 300 vertices. In other words a very dense graph.
5. Explore the graph
Let' look at the first 12 vertices of the graph. We'll create a subgraph and then visualize it by setting the edge width to the number of trips taken. The most obvious thing that stands out is that there are many loops in the graph. These represent trips where people took a bike out from the station and returned it to the same one. In fact based on the edge width we can see this is the most common kind of trip! Another thing we see is that closer vertices tend to have more trips between them than distant vertices. However we should be careful, just because igraph draws them close together doesn't mean that they are geographically close (but we'll cover that later in the chapter).
6. Let's practice!
Now that we've seen how we create the graph, the weights we use, and what the graph looks like let's move on to see how the graphs of different kinds of users compare.