Motivation: social networks and predictive analytics

1. Motivation: social networks and predictive analytics

Welcome to this DataCamp course on predictive analytics using networked data in R. Here you will learn about specific social networks called labeled networks and how to use the network structure to predict labels of unknown nodes.

2. Applications

Many social networks have labeled nodes. Think about gender and age for example. In a network of credit card transactions, some cards are stolen and can, therefore, be labeled as fraudulent, like the red nodes in this network. In many such cases its useful to be able to label unknown nodes using predictive modeling. Your goal is to predict customer churn based on a customer's social network. By churn, we mean customer defection, when a customer terminates all his business with a company or simply stops using its services. Then we say that the customer has churned, we call him a churner. Customer churn is a current problem in many businesses. To prevent this, they try to predict which customers are most likely to churn and offer them promotions.

3. Overview

In this course we will show you how to construct and label a network in R. We will also show you how to predict labels of unknown nodes directly from the network using network learning. In the second chapter, we introduce homophily and show you how to measure relational dependency to determine whether the network structure can be used for predictive modeling. Then in the third chapter, we show you how to compute and extract various network features. Finally, we will use those features to predict the labels of unknown nodes using supervised analytics techniques. But first, we look at a simple labeled network.

4. Collaboration Network

Let's assume there are ten data scientists working at DataCamp, called A, B, C, etc. These data scientists like to collaborate when they create courses for the DataCamp platform so we link them together based on their collaborations and create a social network.

5. Collaboration Network

This is how you construct such a social network in R. The data frame `DataScienceNetwork` is an edgelist where each row represents an edge in the network. Then you create the network using the `igraph` function `graph_from_data_frame`. To plot the network use the plot function. The other arguments are for customizing how the network looks, such as the label and color of edges and nodes, and the position of the nodes, using the pos object. Note that a node is called a vertex in the package.

6. Collaboration Network

Each of the data scientists has a preferred programming language, R or Python. We add their preferences as a node attribute called technology. Node attributes are represented by `V(g)` We also color the nodes depending on the preference, with blue for R and green for Python. The preferred programming language is the label of the nodes in this network. Here is the network with the nodes colored. Let's assume that the preference of Cecilia, or node C, is unknown. She is connected to three R and one Python user. Can we use the network structure to infer what her preference is? This is what you will learn in this course!

7. Churn Network

To build a network, you can use an edgelist, where each row represents an edge between the from and to nodes. These are the first lines of the customer edgelist that you will work with in the exercises. It has customer ids in each of the two columns. These edges represent a relationship between the customers, such as friendship, being linked on social media or a phone call connection. On the right is the corresponding network.

8. Let's practice!

Now, let's get started on the customer network!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.