Summary of homophily

1. Summary of homophily

In the last two lessons, you learned how to compute the dyadicity and heterophilicity of a network. Both parameters are necessary to capture the detailed interplay between the network structure and node properties. But how do they relate to homophily?

2. Can I do predictive analytics with my network?

One of the essential questions before doing predictive analytics using networked data is deciding whether the predictive models might benefit from CNA (Complex Network Analysis). In the case of predicting churn, do the relationships between people play an important role, and is churn a contagious effect in the network? Are the two labels randomly spread over the network, or are there observable effects indicating that there is a social phenomenon, i.e. do churners tend to cluster together. In terms of the network structure, this means that connections amongst nodes with the same label are more common, i.e. that the networks are dyadic, and that connections between nodes with different labels are rarer, i.e., that the networks are heterophobic. Let's look at an example.

3. Homophily

Take a look at this network here. At first glance, you can see that the green nodes are somewhat clustered together and linked to each other. In addition, the number of cross label edges, which are colored red, is not high. Based on what you know about homophily, you would probably say that this indicates that you are looking at a homophilic network.

4. Homophily

So let's determine this by computation. Here you see the network again. We start by counting the number and type of nodes and edges. As you can see in the R code, there are 40 nodes, assigned to the variable N and 39 edges which we denote by E. Furthermore, the number of green nodes, n_green is 10 and the number of white nodes, n_white, is 30. Finally, there are 6 edges that connect two green nodes, denoted by e_green. By counting the red edges we obtain 13 cross label edges, which we denote by e_mixed. Next, we compute the connectance of the network, or p, the expected number of green and cross label edges, which we name m_green and m_mixed, respectively. Now we can compute the Dyadicity by dividing the number of green edges with the expected number of green edges. As you can see, the value is greater than one which means that the network is dyadic. We compute the heterophilicity in the same way, by dividing the actual number of mixed edges with the expected number of mixed edges and get a value that is smaller than 1. As you recall, homophily is characterized by nodes of the same label being more connected to each other and nodes of opposite labels being less connected to each other. This network has dyadicity greater than 1 and heterophilicity less than one. therefore we can infer that this network is homophilic.

5. Let's practice!

Now you should be able to determine whether a network is homophilic based on its dyadicity and heterophilicity. Note, however, that these are not absolute values. The network can be homophilic even though it is only either dyadic or heterophilic, or if the values for D and H are only slightly different from one. The important thing is that the distribution of the labels is not random and we can use the information to make predictions in the network. A network that exhibits evidence of homophily, is worthwhile to investigate more thoroughly. For each instance of interest, we extract features that characterize the instance based on its relational structure. This is the topic of the next chapter. For the remainder of this chapter let's try some examples.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.