Get startedGet started for free

Basic Network features

1. Basic Network features

In the previous chapters, you learned about labeled networks and how to construct them from edgelists and customer dataframes. We also showed you how to determine whether a labeled network is homophilic by measuring its heterophilicity and dyadicity. It is important to know this before embarking on a predictive modeling venture with networked data because it means that the labels of the nodes depend on each other. In this chapter, you will learn how to extract features from the network that can be used for predictive modeling. We start with some basic network features.

2. Neighborhood features

First, we compute the degree, or the size, of a node's neighborhood. Let's look at the node I in the network of data scientists, You can see the nodes in its first order neighborhood here. These are F, G, H, and J, so I has degree 4. Use the function `degree` in the `igraph` package to compute the degree of the nodes in the network, as seen here. By counting all nodes that are connected to the nodes in the first order neighborhood we get the second order neighborhood. This is the second order neighborhood of node I. Its size is 7 and includes the nodes C, D, E, F, G, H, and J. When finding the size of the second order neighborhood you use the function `neighborhood.size` specifying `order=2`, as you can see in the `R` code here.

3. Neighborhood features - triangles

A triangle in a network consists of three nodes that are all connected to each other. It indicates that the individuals form a closely connected group. Node I in the data scientists network forms a triangle with nodes H and J as you can see in the figure on the left. On the right, you can see that I does not form a triangle with nodes F and J because F and J are not connected to each other. We count the number of triangles that each node is a part of using the function `count_triangles` as seen in the `R` code here. Node A has 4 triangles for example.

4. Centrality features

The next type of features we consider are called centrality features. They measure the impact nodes have on the whole network. First, betweenness counts how often the shortest path between two nodes goes through a given node and gives an indication of how much information passes through a node. Look for example at the nodes A and B. Node B is never on the shortest path between two nodes, whereas A is on the shortest path between nodes B and E. A has, therefore, higher betweenness than node B. Second, by counting how many steps are required to get to every other node in the network we get an indication of how easily a node reaches the other nodes. This is called closeness. Here you can see the `R` code to compute betweenness and closeness.

5. Transitivity

Finally, transitivity, or clustering coefficient, measures the extent to which the nodes in the network are connected. It is obtained by dividing the number of triangles by the number of triads each node has. A triad is formed by three nodes and two edges, that is, it is a triangle where one of the edges are missing. On the left, you can see the triangles of node I and on the right one of the triads, formed by the nodes F I and J. If F and J were connected then this would be a triangle, as indicated by the dotted line. There are two types of transitivity, global, computed for the whole network, and local, computed for each node separately. Here is the `R` code to compute local transitivity.

6. Let's practice!

Now let's compute some features for the churn network!