Introduction to the dataset

1. Introduction to the dataset

You’ve made so much progress! Very good work! We’ve now reached the case study portion of this course, and you’ll be applying everything you’ve learned so far to a data set to tie everything together!

2. Dataset & case study introduction

Let me first introduce to you the dataset. You will be analyzing a college forum dataset, covering 6 months worth of students posting to forums. As you might imagine, this naturally forms a bipartite graph, in which the node partitions are the “students” and the “forums”, and an edge exists if a student posted to a forum. In this chapter, you will be reviewing a few activities, namely: constructing a graph from a pandas DataFrame, computing unipartite projections of a bipartite graph, visualizing the graph using Circos plots, and finally performing time series filtering and analysis on the graph. Most of what we’ll go through in the coming videos will be recapping functions and code that you’ve already gone through, just to help jog your memory.

3. Graphs from DataFrames

First off, let’s start with making a graph from a pandas DataFrame. Suppose we have a graph, df, that is a customer-product bipartite graph edgelist. Assuming we have the graph G instantiated in memory, we can use the G dot add_nodes_from(container) function to add nodes from each partition, passing in as arguments the column of data (for example, df select ‘products’), and to assign the partition to each node in the container, we can also pass in bipartite equals partition name to the add_nodes_from function call. Calling on G dot nodes, we can do a sanity check to make sure all of the nodes have been added correctly. Additionally, we can confirm that we haven’t added any edges yet by calling G dot edges; this is another good sanity check to perform. To add edges from the edgelist,

4. Graphs from DataFrames

recall that we’re able to use the zip function. If you pass two list-like objects of equal length into the zip function, it will iterate over pairs of elements from each object, in order of index. Calling the G-dot-edges function, we find that the edges have been added correctly, with an edge between a customer and a product. Now that we have the bipartite graph created in memory, we can now compute

5. Bipartite projections

the customer and the product projections. To do so, we need a container of nodes for each projection. Let’s assume here that we didn’t have the original data; to collect nodes that belong in each projection, we can use a list comprehension. Using the customer nodes as an example, we can use "n for n in G dot nodes if G dot node select n select ‘bipartite’ is equal to ‘customers’". The analogous operation can be done for the ‘products’ partition. Following this, we can call on the “projected_graph” function from NetworkX’s bipartite module, passing in the original graph G and the respective node containers. Checking each of the graphs, prodG and custG, we see that prodG contains only product nodes, and custG contains only customer nodes, again serving as a good sanity check.

6. Let's practice!

Great! You’re now ready to analyze the students-forums bipartite graph! Have fun!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Intermediate Network Analysis in Python

AdvancedSkill Level

4.8+

64 reviews