Heterophilicity

1. Heterophilicity

In the last lesson you learned to measure dyadicity, that is the connectedness between nodes with the same label. For a network to show signs of homophily, it is not enough for nodes of the same label to be more connected, there should also be fewer connections between nodes of opposite labels. This is measured with heterophilicity, which is the other parameter you need to capture the interplay between the network structure and node properties. Let's take a look!

2. Heterophilicity

Heterophilicty measures the connectedness between nodes with opposite labels and thus how much interaction there is between nodes with different labels. Let's take a look at these two networks, which both have 9 white nodes and 6 green nodes. Imagine for example that the green nodes are younger people and the white nodes are older people and that the edges denote friendships on Facebook. On the left, there are 4 edges that connect a younger person to an older person. The two groups are quite separated. On the right, on the other hand, the number of cross label edges is 11. There is more interaction between older and younger people.

3. Heterophilicity

Heterophilicity measures the connectedness between nodes with different labels compared to what is expected in a random configuration of the network. To compute the expected number of cross label edges, that is edges that connect a white node and a green node, we use this combinatoric formula. We multiply the number of each type of node, denoted here with n_ w and n_ g with the network connectance, p. In the network with 9 white nodes, 6 green nodes and connectance equal to 0.2 the expected number of cross label edges is 9 times 6 times 0.2 or, 11 Finally, we compute heterophilicity or H by dividing the actual number of cross label edges with the expected number of cross label edges, using the formula you see here.

4. Heterophilicity

Here you can see the two networks from before. The network on the left has heterophilicty 1.39 and the network on the right has heterophilicity 1.02.

5. Types of Heterophilicity

As we said before, heterophilicity is a measure of the actual number of cross label edges in comparison to the expected number of cross label edges. Therefore we can distinguish three scenarios depending on the value of H. If H is greater than 1 we say that the network is heterophilic because there are more connections between nodes of different labels. If H is (almost) equal to one, the distribution of the labels is the same as in a random network. If H is less than 1 we say that the network is heterophobic since nodes of opposite labels do not tend to be connected. Here you can see an example of label distribution in the same network for each of the three scenarios and the corresponding value for H.

6. Heterophilicity in the network of data scientists

Let's now compute the heterophilicity in the network of data scientists. On the right you can see the R code to compute the connectance of the network, the expected number of cross label edges and the heterophilicity, which we denote with H_rp. As you can see, the value for the heterophilicity is equal to 0.39. It is less than one, so there are fewer connections between R and Python nodes than what we would expect in a random configuration. This implies, that indeed, R users and Python users do not collaborate that frequently and means that the network is heterophobic!

7. Let's practice!

The key takeaway from this video is that heterophilicity is a measure of how well connected nodes with opposite label are. It's time to put this into practice

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.