1. Network Structure
In this chapter you are going to work with the Forrest Gump network dataset. You will use this to learn various methods for describing the structure and sub-structure of social networks.
Each edge of the Forrest Gump network indicates that those two characters were in at least one scene of the movie together. Therefore, this network is undirected. To familiarize yourself with the network, you will first identify key vertices using eigenvector centrality and plot the network.
2. Eigenvector centrality
Eigenvector centrality is a measure of how well connected a vertex is. Vertices with the highest eigenvector centrality are those that are connected to many others but especially to other vertices who themselves are highly connected to others.
In igraph, you calculate eigenvector centrality by applying the eigen_centrality() function to your graph object. The returned object is a list containing many things, but the actual centrality scores for each vertex can be found in the named element vector.
As you can see in this example undirected network, the vertices A, D, E, F and H are particularly high in eigenvector centrality.
3. Density
Up to now you have largely been calculating measures of vertices such as degree, betweenness and eigenvector centrality. However, there are also a family of measures that tell us something about the overall pattern or structure of networks.
The simplest measure of the overall structure of a network is its density. This is equivalent to the proportion of edges that actually do exist in a network out of all those that potentially could exist between every pair of vertices.
In these networks of 13 vertices, there could potentially be 78 edges if all vertices were connected. In the network on the left there are 15 edges so the density is 0-point-19, equal to 19% of potential edges being present. The network on the right has 30 edges which is a density of 0-point-38, or 38% of all potential ties existing.
Density is therefore a measure of how interconnected a network is. It can simply be calculated in igraph using edge_density().
4. Average path length
Another measure of the interconnectivity of a network is average path length. This is calculated by determining the mean of the lengths of the shortest paths between all pairs of vertices in the network. In igraph this is calculated by applying the function mean_distance() to the graph and instructing the function whether the graph is undirected or directed.
The network on the left has an average path length of 2-point-47 whereas the network on the right has an average path length of 1-point-81. This is because the average shortest path between vertices is reduced. For instance, the shortest path between G and K on the left is four steps: G-F-A-C-K. On the right it is two steps: G-J-K.
This suggests that the network on the right is more interconnected and facilitates flow between vertices more readily.
5. Let's practice!
Now it's your turn.