Get startedGet started for free

Link-Based Features

1. Link Based Features

So far we have looked at some basic network features such as degree, transitivity, and betweenness. Now we will take a look at another type of features which are called link-based features. Link-based features are network features that take into account the properties of neighboring nodes.

2. Adjacency matrices

The edges in a network can be represented using adjacency matrices. An adjacency matrix is a square matrix with one row and one column for each node in the network, where a non-zero entry indicates a link in the network. Here you can see the adjacency matrix for the network of data scientists. The link between nodes A and B is indicated with a 1 in row A, column B and also row B, column A. We will need this representation of the network to compute the link based features. If we have a network object in R, extracting the adjacency matrix is simple. You call the `get.adjacency` function in the `igraph` package with the network as an argument as seen here.

3. Link based features

Link-based features are network features that take into account properties of neighboring nodes. In the case of the data scientists, this could be counting the number of neighbors that prefer R. You can see that node A has four neighbors that prefer R and node I has one. We can compute this for the whole network using the adjacency matrix and a vector indicating the preference. In the vector here we have indicated the preference of each data scientist with 1 for R and 0 for Python. Then we use matrix multiplication to multiply the matrix with the preference vector to obtain a new vector with the number of neighbors that prefer R. This works as follows: Each row in the adjacency matrix represents the neighborhood of the respective node. The first row, for example, is the neighborhood of node A, and since there is a 1 in columns 2,3,4 and 5, it means that B, C, D, and E are neighbors of A. By taking the dot product of the row with the preference vector, we sum up the desired preference of the nodes in the neighborhood. For example node G has three neighbors that prefer R. Here you can see how to compute this in R. We obtain the adjacency matrix A using `get.adjacency` and create a vector called `preference` indicating the preferred programming language. Then we compute the number of R neighbors using the percentage `percentage sign-star-percentage sign` operator for matrix multiplication. The result is a vector with the number of neighbors that prefer R.

4. Neighborhood features

Here we see another example, where we compute the average of some property of neighboring nodes, in this case, the age of the data scientist. As you can see, their age is indicated on the nodes in the network. To get the total age of the neighborhood, we multiply the adjacency matrix with the age vector, and to obtain the average age, we divide with the number of nodes in the neighborhood, the degree. In the R code, we start again by extracting the adjacency matrix A. We also specify the age vector called age, and we extract the degree using the `degree` function in the `igraph` package. Then we perform the calculation, using the `%*%` operator and we divide with the degree. The result is the vector `averageAge` which indicates, for all the nodes, the average age of the nodes in the neighborhood.

5. Let's practice!

Now it's your turn to compute some link-based features.