Summary and final thoughts

1. Summary and final thoughts

Congratulations! You made it to the end! In this course, we have been through a journey that started with simple edgelists that we turned into labeled social networks and a quest to accurately predict the labels of nodes with unknown labels. Along the way, you learned about homophily and how to measure relational dependency in the network. You became familiar with various types of network features, how to extract them from the network and transfer them into a flat dataset. Finally, you used the features to predict unknown labels and measured how good you models were.

2. Labeled networks

Starting from edgelists like this one that represents the relationships between customers, we used the igraph package to turn it into a network like this one. Because we had additional information about which customers had left the company, or churned, we could label the customers in the network accordingly. In this network, the churned customers are colored red.

3. Homophily

In chapter two you learned how to measure the relational dependency between the nodes in a network. This is an important step since it helps us determine whether the connections in the network can be used for predictive modeling. When the labels are randomly distributed in the network we cannot use its structure to predict the labels. We discussed how to compute two parameters in the network. Firstly dyadicity, or the connectedness between nodes with the same label and secondly heterophilicity, or the connectedness between nodes with opposite labels. We illustrated how they relate to homophily, the social phenomenon of people tending to associate with those they perceive as being similar to themselves. In the fraud network, you see here, the green nodes which represent legitimate credit card transactions are highly connected amongst themselves and the same holds for the red nodes which represent fraudulent transactions. This network shows very clear signs of homophily.

4. Network Featurization

The properties of the igraph package can be used to compute several network attributes. We did this in chapter 3, where we featurized the churn network. We extracted simple network features, centrality features, link-based features as well as page rank scores and added them to the network object. Here you can see what a labeled network object looks like before it is featurized. It has two node attributes, name, and label. To compute the degree of the nodes in the network you simply call the degree function, and you can add that attribute to the network using the V operator. After the network is featurized it has multiple other node attributes, as you can see here.

5. Model building and evaluation

In the final chapter, we showed you how to turn the featurized network object into a flat dataset, using the `as_data_frame` function. The result is a dataframe that can be used in classical predictive modeling. We discussed some data cleaning techniques, such as dealing with missing values and removing correlated variables. Note however that other techniques exist as well and are equally important. Then we split that data into a training and test set and trained both a logistic regression model and a random forest model on the training set. Using the test set we evaluated the model performance using the measure auc and top decile lift. As a result, we could compare the performances of the different models in order to select the best one.

6. Congratulations!

Networks exist everywhere. Whenever objects can be linked in some way, whether it is by friendship, interaction or similarity, a network is obtained. We hope you have learned some useful techniques in this course that you can apply in your network application. Thank you for joining us.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.