1. Evolving graph statistics
Great work with the first set of exercises for chapter 3. We’re going to continue exploring the concept of evolving graph statistics in this coming set of exercises.
2. Evolving graph statistics
By evolving graph statistics, what do we mean? Well, basically it boils down to graph summary statistics, and how they are changing over time. For example, the number of nodes involved in an edge, or the number of edges, are graph summary statistics that might be changing over time, such as in communication networks. Alternatively, we might look at graph-theoretic summary statistics, such as the degree distributions over time, or the centrality distributions over time. Evolving graph statistics have their use in time-series analysis.
3. Evolving graph statistics
For example, in a communication network, we might be interested in spikes of activity, as described by spikes in the total number of edges in a network. In computing evolving graph statistics, because there are
4. Evolving graph statistics
natural tabular representations of graphs (recall edgelists), for simple metrics (such as number of edges over time), it might be easier to perform analysis on the edge list data. On the other hand, for graph theoretic metrics, you may want to use the graph objects in memory that you’ve been working with. In the following exercises,
5. Cumulative distribution
we will be asking you to plot the cumulative distribution of degree centralities over time. This is a very compact way of representing the distribution of values in a dataset. As you can see, if we were to try plotting the distribution of some graph statistic (for example, shortest path distribution, or degree centrality distribution) over time, we might need multiple histograms, or we could plot
6. Cumulative distribution
the empirical cumulative distribution, or ECDF, of the data over time on a single plot. The x-position of the ECDF are the values of the data and the y-axis position tells us the fraction of values less than or equal to the corresponding x-value. For example,
7. Cumulative distribution
when y equals 0-point-5, half of the total dataset lies below the corresponding x-value. This is known as the median or the 50th percentile. This representation avoids binning biases, makes it easy to read off percentiles, and provides a clearer way of visualizing multiple distributions on one plot. In the DataCamp statistics curriculum, you can learn how to construct and plot the empirical cumulative distribution from data; for this course, the function will be provided to you, and I simply want you to be aware of what the ECDF is plotting.
8. Let's practice!
Alrighty, let’s get some practice!