Understanding network structures

1. Network Randomizations

2. Random graphs

Although it is straightforward to calculate network metrics such as average path length it is not always clear how meaningfully high or low these values are without some context. To help infer meaning from these metrics network scientists often use random network graph techniques. Let's first describe what a random graph is. Essentially a random graph is a network that is randomly generated by an algorithm to have a set of characteristics similar to the original network. For instance, the simplest random graph you could generate would be one that has the same number of vertices as your original graph and approximately the same density as your original graph. Let's say the network on the left is your original network. The network on the right is one random graph produced by the algorithm. They both have the same number of vertices and approximately the same density. To generate such a random network in igraph you can use the erdos-dot-renyi-dot-game() function. The first argument is the number of vertices in the network - you can use gorder() for this. The graph density should be used with the second argument p-dot-or-dot-m. The type should be set to 'gnp'.

3. Random graphs & randomization tests

If you rerun the random graph algorithm many times you will notice that it produces a different network graph each time. This is particularly useful when you wish to determine if some property of your original network - say, average path length - is particularly unusual or noteworthy. You might think that the average path length appears to be particularly low or high. You could calculate average path length for say 1000 random networks that are based on your original network. Then you can observe if your observed original value is particularly different from those produced via the random networks. This is the basic principle of network randomization tests.

4. Generating 1000 random graphs

Shown here is some code for producing 1000 random networks in a for loop. Each network will have the same number of vertices and approximately the same density as the original network. The 1000 networks are stored in the list object 'gl'. To calculate the average path length for each random network in the list 'gl', you can use lapply() to apply the function mean_distance() to each network in the list. Here, these values are being stored as the object gl-dot-apls.

5. Comparing to the original network

The easiest way to compare your original network's average path length to the random networks is to plot the data. You can make a simple plot of the distribution of average path lengths from the random networks using hist(). Adding a red dotted line with abline() which crosses the x-axis at the original network's average path length value allows you to make a direct comparison. Here it is clear that the average path length observed in the original network is likely very typical for a network of its size and density. You will now use these randomization methods to draw inferences about the average path length in the Forrest Gump network.

6. Let's practice!

Now it's your turn.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.