Social network metrics

1. Social network metrics

In this lesson, we discuss how social networks can be summarized in a descriptive way using various social network metrics.

2. Geodesic

The geodesic represents the shortest path between two nodes. In our example network, the geodesic between nodes A and I is represented by the blue edges. Note that also the weights can be taken into account when calculating the geodesic. In fraud detection, you could calculate the geodesic between a given node and a fraudulent node. The closer to the fraudulent node, the bigger its influence and impact.

3. Degree = number of edges

The degree of a node represents the number of edges or connections. In our example, node A has 2 connections,

4. Degree = number of edges

node B has 2 connections,

5. Degree = number of edges

node C has 1 connection,

6. Degree = number of edges

and node D has 3 connections. For our network, the maximum degree equals 3. More generally, the maximum degree possible for a network with N nodes is N-1. The normalized degree can be obtained by dividing the degree by the maximum degree possible. In our case, we can see that node D has a normalized degree of 1 since it has the maximum number of connections.

7. Closeness

The closeness measures the extent to which a node is near to all other nodes in the network. It measures the distance of a node to all other nodes in the network. Note that the distances are calculated by using the geodesic or shortest path. For a network with N nodes, the maximum closeness is obtained when a particular node is connected to all other N-1 nodes, resulting into a closeness of 1 divided by N-1.

8. Closeness

9. Closeness

10. Closeness

11. Closeness

12. Closeness

Hence, we can calculate the normalized closeness by dividing the distances by N-1, or in our case 3. We see again that node D has maximum closeness. If a fraudulent node has a high value for closeness, then fraud might easily spread through the network and contaminate the other nodes.

13. Betweenness

The betweenness counts the number of times that a node or edge occurs in the geodesics of the network. Here you can see a simple example of how the betweenness can be calculated.

14. Betweenness

Nodes A and E lie between no 2 other nodes. Hence, their betweenness is 0.

15. Betweenness

Node B lies between A,C and A,D and A,E resulting in a betweenness of 3.

16. Betweenness

Node C has the highest betweenness since it lies between A,D; A,E; B,D and B,E.

17. Betweenness

Finally, Node D lies between E,C; E,B and E,A, resulting in a betweenness of 3.

18. Featurization

We have seen how new features can be created and added to the data. The idea of adding network features to the data is commonly called featurization. Featurization refers to mapping network characteristics into features and combining them with the local variables. From our research results we can conclude that a combination of both types of features lead to the best performing models. The extended dataset can now be analyzed with various supervised methods utilizing these additional sources of information (that you have seen in other DataCamp courses). However, before doing that let us have a closer look how we can deal with the imbalance problem in chapter 3.

19. Let's practice!

Of course, it is first your turn to enrich a dataset with network features and test whether this increases the ability to detect fraud.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.