1. Case study!
Great job coming through! I hope you've been having fun with the coding exercises, and have been learning lots of new concepts. In Chapter 4, we're going to do one in-depth case study to help you consolidate everything you've learned so far. Before we go on,
2. Data
let me first describe to you what the network data you will be playing with will look like. This dataset is a Github user collaboration network. GitHub is a social coding site, where users can collaborate on code in repositories. In this network, nodes are users, and edges indicate that two users are collaborators on at least one GitHub repository. What you'll be accomplishing by the end of the exercises is the following: firstly, you will have analyzed the structure of the graph, including basic properties. Secondly, you will visualize the graph using nxviz, and finally, you will build a simple recommendation system. A recommendation system in social networks recommends users to "connect" with one another in some fashion. In the GitHub context, we will try writing a recommender that suggests users that should collaborate together. Before I launch you into the exercises,
3. Graph properties
let me first give you a quick recap of the functions that you might need. Recall from the first chapter about some basic functions for getting a graph's size. If we have a graph G in memory, we can get the number of edges and number of nodes in the graph by doing len(G-dot-edges) and len(G-dot-nodes). Here, we can see that there are 29 edges connecting 20 nodes. Are you able to recall what the function names are
4. Graph properties
for computing the degree and betweenness centralities of each node in the graph? Quickly pause the video for a few seconds... For degree centrality, did you get nx-dot-degree_centrality(G)? Likewise, the function for getting the betweenness centrality is nx-dot-betweenness_centrality(G). In both cases, what they return is a dictionary, in which the key is the node name, and the value is the centrality score of that node.
5. Data
In the coming exercises, you will be characterizing the size of the node and edge lists, and plotting the degree and betweenness centrality distributions
6. Let's practice!
Alright, let's get hacking!