Get startedGet started for free

Individual-level network metrics

1. Node-level metrics

Now that you know how to load and visualize Twitter data, let's look at a few ways to analyze this data quantitatively. For this section, we're going to focus on a few node-level network metrics.

2. Centrality: node importance

Say we'd like to understand who the most important user in the network is, or find the group of most influential users, those who are the elites of communication. This is where the concept of centrality comes in. Centrality is a metric in social network analysis which attempts to find the most important node in a given network. There are many different ways to do this, but we're going to focus on two types: degree centrality and betweenness centrality.

3. Degree centrality

Degree centrality is the simplest measure of centrality. It's a measure of how many edges are connected to a particular node. In a directed network, we can further decompose degree into in-degrees and out-degrees. Remember, Twitter networks are directed, which means that edges only go one way and are not mutual. In-degree is the number of edges going into a node, while out-degree is the number of edges going out of the node. You can think of this as the number of times someone is retweeted versus how much retweeting they do. In this particular network visualization, node sizes are proportional to the in-degree, which could represent how many times the user is retweeted.

4. Betweenness centrality

Betweenness centrality measures how many shortest paths between pairs of nodes need to pass through any given node. Think of an airport. It doesn't have to have many inbound planes to be important, but if it connects cities from many parts of the world to each other, it would have high betweenness centrality.

5. Printing highest centrality

For node-level metrics, we'll often want to see which users have the highest value for a particular type of centrality. Who's being retweeted the most, or who is bridging discussion networks? Doing this is straight-forward with pandas. We first calculate our metric, in this case, betweenness centrality. We then store it in a data frame. We have to use the `items` method to get the name of the user and the metric. We'll also pass an argument for column names. We can then sort using the `sort_values` method.

6. Centrality in different networks

The meaning of each centrality measure in Twitter networks depends highly on which network you are looking at. For retweet networks, high in-degree centrality signals someone who gets retweeted a bunch. High out-degree centrality signals someone does a lot of retweeting. And high betweenness centrality means someone bridges different types of topical or ideological communities. Meanwhile, in reply networks, high in-degree centrality signals someone who gets a lot of inbound messages, which could signal either agreement or disagreement. High out-degree centrality can signal someone who gets into many discussions. And high betweenness centrality may signal someone who bridges several different discussion communities.

7. The ratio

A last node-level metric that's particular to Twitter is the Reply-to-Retweet ratio, also just called the ratio. While this has yet to be systematically shown in scientific study, a popular Twitter belief is that a user or even a single tweet with a high ratio may signify that the user or their tweet is deeply unpopular due to users replying in disagreement. We can calculate this by creating a in-degree data frame for both the reply and retweet networks. We then join the two data frames into a single data frame using the `merge` method. Lastly, we can calculate the ratio by dividing the number of replies by the number of retweets.

8. Let's practice!

In the following exercises, we're going to generate these metrics for the State of the Union dataset. Pay special attention to which users end up in the top spots for each of the metrics. Are they the same, or are they different?