Get startedGet started for free

Social network based inference

1. Social network based inference

2. Social network based inference

In this video, we will discuss how to predict the behavior of a node based on the behavior of other nodes in the network.

3. Social network based inference

The challenge is that the data are not independent because the behavior of one node might influence the behavior of other nodes. Hence, the correlational behavior between the nodes needs to be considered. Furthermore, procedures for collective inference are needed because inferences about nodes can affect each other.

4. Non-relational vs relational

We can can consider two types of models. A non-relational model is a typical classification model which only uses local information. No network information is taken into account here. This model can be estimated using any of the traditional methods, such as logistic regression, decision trees, and so on. A relational model on the other hand is a classifier which uses the network links to make the classifications. We will discuss the relational neighbor classifier.

5. Relational neighbor classifier

A first example of a relational model is the relational neighbor classifier. It assumes homophily, meaning connected nodes have a propensity to belong to the same class. Homophily implies that fraudsters are more likely to be connected to other fraudsters. The relational neighbor classifier further assumes that some class labels are known.

6. Relational neighbor classifier

Here you can see an example of how the relational neighbor classifier works. The question mark node in the middle has five neighbors, of which three are non-fraudsters and two are fraudsters. The probability that the question mark node is fraudulent equals its number of links to fraudsters, which is 2, divided by its total number of links, which is 5. The probability that the question mark node is fraudulent is therefore 40%.

7. Relational neighbor classifier with weights

Now suppose the links have weights. The sum of the weights of the links with fraudsters is 3. The total sum of the weights over all links is 8. The probability that the question mark node is fraudulent is therefore 3 divided by 8.

8. Relational neighbor classifier

Here we show the names of all 6 nodes and their respective fraud label. Notice that node B and D are fraudulent. We also have the weights of the edges.

9. Relational neighbor classifier

To find the probability that the question mark node is fraudulent, we first create a subnetwork which consists of the question mark node and both fraud nodes B and D. Next we sum up the edge weights of the adjacent edges in both the original and subnetwork with function "strength". By dividing both sums, we find the estimated probability of fraud.

10. Let's practice!

Now let's build a relation model yourself!