Get startedGet started for free

Homophily

1. Homophily

The nodes in social networks are not randomly linked together. Since these networks represent people's social interaction, whether it be by being friends on Facebook, or having watched the same movie on Netflix, there is always a reason for the connection.

2. Homophily explained

The reason for links to specific people is that they share a common property. For example common interests or same origin. This phenomenon is called homophily, a concept borrowed from sociology that can be explained by the phrase 'Birds of a feather flock together'. It means that people have a strong tendency to associate with others whom they perceive as being similar to themselves in some way. That is, based on node attributes, similar nodes are more likely to connect to each other than dissimilar ones. Homophily in networks is characterized by the connectedness of nodes with the same label. If nodes with a certain label are to a larger extent connected to other nodes with the same label, then the network is probably homophilic.

3. Homophilic Networks

Here you see two networks with green and white nodes. In the network on the left, the green nodes are randomly spread through the network, and the distribution of links to white and green nodes is similar. This network is not homophilic. On the other hand, the green nodes in the network on the right are connected to each other to a larger extent. This network is homophilic.

4. Types of edges

When we have a social network, it is useful to somehow quantify the homophily. Homophily depends on the labels of nodes that are connected, so it helps to define different types of edges. Let´s look again at the network of data scientists. Depending on the preferred technology, there are three types of edges, which we add as an edge attribute to the network. In the R code, you can see that the DataScienceNetwork dataframe has an additional attribute called label, indicating the preferred technology.

5. Types of edges

We also add an edge attribute called color, where we color the edges depending on the label. Edges that connect two data scientists that prefer R, denoted by 'rr' we color blue, Edges that connect two data scientists that prefer Python, denoted by 'pp' we color green. Finally, edges that connect two data scientist that prefer different technologies, denoted by 'rp', we color red. These edges are also called cross label edges.

6. Counting edge types

Here you can see the network with edges colored. We will use the number of each type of edge later in this chapter. Let's count them and give them variables names. This is the R code to count the number of types of edges. In this case, we condition on the label edge attribute to count the types of edges. edge_rr is the number of edges connecting two R nodes edge_pp is the number of edges connecting two Python nodes edge_rp is the number of cross label edges Finally, you can see the actual numbers of each edge type. There are ten R edges, five Python edges and four cross label edges.

7. Network connectance

The last concept we need before we can continue is network connectance which we denote with p. It is simply the ratio between the actual number of edges and the number of edges if the network was fully connected. Here you can see the equation, the R code and the result for the data scientist network. Note that the number of edges in a fully connected network equals the number of nodes choose 2, as in the formula you see here.

8. Let's practice!

Now it is time to try. In the exercises you will work with the churn network and count the number of different types of edges.