Labeled networks and network learning

1. Labeled networks and network learning

In the last video, you became familiar with labeled networks and using the network to predict labels of nodes when they are unknown. It is time to get started on your own labeled network. You will be working on a customer network with the objective of predicting churn, i.e. identifying the customers that are most likely to terminate their contract with the company. Let's take a look at the dataset again.

2. Churn prediction in social networks

Here you see the first few rows of the customer dataframe with the customer id together with their churn indicator. 1 means that the customer churned and 0 that the customer did not churn. You can also see the first lines of the customer edgelist that was used to construct the network, with customer ids in each of the two columns. On the right is the corresponding network with churn nodes colored red and non-churn nodes colored white. The goal of the exercises is to predict who is most likely to churn in the future, based on the current situation. By using the social network we are assuming that churn is a social phenomenon and that being connected to someone that churned implies an increased probability of also churning. We will also talk about churn influence, meaning that the churners in the network influence others to churn as well. If we assume that churners influence others to churn as well, node 393 would have the highest churn probability in this case.

3. The Relational Neighbor Classifier

Let's start by looking at a simple network learning technique to infer labels. It is called the relational neighbor classifier and assigns a label based on the labels of neighboring nodes by assuming that linked nodes have a propensity to have the same label. We will demonstrate this using the network of data scientists and focus again on Cecilia. We see that she has four neighbors: A, B, D, and G. Three of them prefer R and one prefers Python, that is, 75% of the linked data scientists prefer R and 25% prefers Python. Based on this information alone we could say that Cecilia is more likely to be an R user.

4. The Relational Neighbor Classifier

Here you can see the R code to compute the relational neighbor score relative to R for everyone in the network simultaneously. We have two vectors, `rNeighbor` and `pNeighbor` with everyone's number of neighbors that prefer R and Python respectively. The values in the vectors have the same order as the people in the network, that is, the first value corresponds to A, the second value to B and so on. So to compute the ratio of neighbors that prefer R, we simply divide rNeighbors with the sum of rNeighbors and pNeighbors. Note that the sum of the two vectors equals the number of neighbors. And here you can see the result. Under the assumptions of the relational neighbor classifier, these are the posterior class probabilities of each node belonging to the class R. As you see, the probability of node C is 75%.

5. Let's practice!

Now it's your turn to practice. First, you will label the churners in the customer network and visualize it. Then you will apply the relational neighbor classifier to identify the most likely churners.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.