Introduction to graph differences

1. Introduction to graph differences

Well done! I hope the last chapter was fun and informative. We’re now going to move on to another topic that’s useful and interesting: the analysis of evolving graphs. Before we continue, let’s motivate it by talking a bit

2. Time series analysis

about time series analysis. Let's first cover some basic concepts. The first is that with time series analysis, we are often most interested

3. Time series analysis

in seeing how some number (or summary statistic), say the average node degree in a graph, changes as a function of time. As such, we keep time series records, with the goal of plotting them to visualize whether there’s some trend (upwards or downwards or flat) over time. The second is that we are sometimes also interested in the rate of change of things over a sliding window of time. With discrete time points, that time window might be one time point and the adjacent one right after, or it might encompass a few time points at one time. It’s intuitive that we might want to track things like our weight over time or our stock investment portfolio value over time. But how do we investigate evolving graphs over time? Before we continue, what’s an example of an evolving graph?

4. Evolving graphs

Well, one example is a communication network. Not everybody is communicating with everybody at every single time, which means that communication networks are dynamically changing over time. When analyzing evolving graphs, we can make one of two assumptions: either only the edges are changing over time, and we assume the nodes stay constant, or that both the edges and nodes are changing over time. The first scenario is easier to analyze, and even though in reality, sometimes nodes show up and sometimes don’t, by assuming that their presence is constant while their edges may not be simplifies the analysis.

5. Graph differences

Graph differences are an essential way to compare graphs that are changing, and because graphs are comprised of two sets (a node set and an edge set), if the node set doesn’t change, then only changing the edge set will result in a change in the graph. We can thus use

6. Graph differences

set differences to map out changes between two graphs in time. The mechanics of this in Python look like this: in comparing two sets, the left set of customers and the right set of customers, we want to know what items are present in the left set that are absent in the right set. In time-series land, this would be equivalent to seeing what was removed from the left timepoint set in transitioning to the right timepoint set. This kind of difference is "non-symmetric". Graph differences operate using this "non-symmetric" difference. NetworkX provides a dot difference(G1, G2) function. This function assumes that the node sets of graphs G1 and G2 are identical, but that the edges might not necessarily be so. So in

7. Graph differences in Python

the code example above, say we have two graphs G1 and G2, one with edges ('cust1', 'cust2') and ('cust3', 'cust2'), and the other with edges ('cust1', 'cust3') and ('cust3', 'cust2'), if we want to see which edges were added to G1 to form G2, we would do nx dot difference(G2, G1); if we wanted to see which edges were removed from G1 to form G2, we would do nx dot difference(G1, G2).

8. Let's practice!

In the following code exercises, you will be analyzing a college messaging dataset, where you will see how the graph changes on a month-by-month basis. Go and have fun with it!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.