Get startedGet started for free

Zooming in & zooming out: Overall graph summary

1. Zooming in & zooming out: Overall graph summary

Awesome work! Thus far, we’ve been looking at different types of graphs. One thing I hope is becoming clear is that we’re able to investigate graphs

2. Graph exploration at scales

at multiple scales. One one hand, we can look at the global scale view of a graph, where we characterize the overall distribution of statistics over all nodes in the graph, such as the “centrality” distributions. On the other hand, we can also look at the “local” scale view of a graph, where we look at individual nodes and the local connectivity structures formed by them. You've done this before when looking at individual nodes and using the local structure to make recommendations, say, for connecting people together in a social network. The same idea becomes an especially useful thing to look at the evolution of a graph over time.

3. Zooming on nodes

In general, this is done by finding nodes of interest, zooming in on them, and then plotting a statistic of the node, such as its degree or betweenness centrality, as a function of time. Now we’re going to do a quick walk-through

4. Summarizing evolving node statistics

of the structure of the code you’ll eventually be writing, which will enable you to investigate nodes of interest and their evolving statistics. Let’s assumen again we have a customer-product dataset, and we’re interested in finding out how purchasing patterns have changed over time. After analyzing the graph over time at a high level, we may have found that customer1 belonged to an interesting cluster of nodes, and we wanted to do a deep dive over time. Assuming we have the graph objects already in memory

5. Summarizing evolving node statistics

as the variable Gs, we first instantiate a variable called noi (standing for “node of interest”), which corresponds to customer1. Say for example, we’re interested in the degree of customer1, which corresponds to the number of products that customer 1 purchased at every time point. We start by instantiating an empty list called degs. After that, we loop over all of the time-point graphs, and append the length of the neighbors of customer1 to the degs list. After that, if you use matplotlib to plot the degs,

6. Summarizing evolving node statistics

you will get the time-series evolution of the degree of that node over time! Now, we could have accomplished the same thing simply by looking at the table of purchase records over time, so what advantages do graphs offer? Well, degree may be computable from the flat table, but other graph-theoretic metrics, such as betweenness centrality, require the connectivity of the entire graph to be known before it is computable. These metrics show the advantage of using graph objects. Let's now introduce

7. Default dictionaries

default dictionaries, which you will be using in the following exercises. Using defaultdicts, we pass in the default type of the value that we’re interested in storing as an argument to the defaultdict constructor. Let’s say we found that the betweenness centrality of heathrow over two time points was 0-point-31 and 0-point-84. We simply have to call d, select ‘heathrow’ dot append(value) for each value. Showing what d looks like, it is of type defaultdict(list), the keys are the strings containing the airport name, and the values are the time-series degree centrality values. Now, let’s say we tried

8. Default dictionaries

doing the same with regular dictionaries. If we tried using the same syntax, we would get a “KeyError”, because the key “heathrow” doesn’t exist in the dictionary. Unlike regular dictionaries, which first require that we instantiate the key with an empty list, default dicts allow us to assume the existence of an empty value, making our code much more concise and elegant.

9. Let's practice!

Okay! Let’s now go on to the next exercises!