Get startedGet started for free

Representing network data with pandas

1. Representing network data with pandas

Good work! That part of this course was hopefully an educational challenge. We’re now going to take a short break from mental work and explore a bit about how to convert graphs into human-readable formats using CSV files and Pandas. If you feel rusty with Pandas, don’t worry - we won’t be doing fancy pandas-fu (meaning you won’t have to call me master sifu - get it? Kung-Fu Panda!), and I will provide you with the necessary code patterns you’ll need to handle the exercises.

2. CSV files for network data storage

First off, let's talk about CSV-formatted edge lists, which look like what's shown on the screen. The file is comma-delimited; each row is one edge, and the columns denote the nodes involved in an edge and their metadata.

3. CSV files for network data storage

Why CSV files for storing network data? Well, the first is that it’s a very human-readable format. Additionally you can take advantage of the the pandas library of functions to efficiently analyze your network data at a basic level, say, characterizing the number of edges or the number of nodes, without having to first instantiate a Graph object. On the other hand, it does come with one particular disadvantage: the representation may not be as compact as a binary file, as the edge list may have node names repeated over (especially for highly connected nodes). That’s the biggest tradeoff, though in my opinion, human-readability and the ability to interface with pandas give CSV files a great advantage. So, how do we store the graph data in its entirety? Well, we will need two lists, which will each become its own pandas DataFrame.

4. Node list and edge list

The first is a node list in which each row is one node, and the columns represent the metadata attached to that node (including the node ID itself). The second is an edge list in which each row is one edge, and the columns represent the metadata attached to that edge, including the IDs of the nodes of interest. So let’s say we have

5. Pandas and graphs

a graph G, with nodes and metadata attached to it, and we want to create a CSV file for the node list. How do we create it? Well, the key here is to create a list of dictionaries, which pandas will recognize as a “record”-style format (each dictionary is one record). We start by first instantiating an empty list, say, nodelist. We can then iterate over the nodes n and their dictionary metadata d. For every node, we create a new record dictionary node_data, and we use one key, say node to uniquely identify it by its ID. The important thing here is to ensure that there is no overlap with the metadata keys. Then, we update the record dictionary with the metadata dictionary, using the node_data dot update(other_dict) method, which will take one dictionary and add in key-value pairs in the other dictionary. Finally, we append the node’s record dictionary to the nodelist.

6. Pandas and graphs

As you can see here, the nodelist has now been transformed into a list of dictionaries, rather than a list of tuples. Once we have the data in that format, we can then

7. Pandas and graphs

pass in the nodelist into a pandas dot DataFrame constructor, yielding the pandas DataFrame of interest. Because the keys are identical in each dictionary, each key becomes a column, and each dictionary becomes one entry in the DataFrame, with the keys being the columns and the values being the values in each row. We can then save it to disk using the DataFrame dot to_csv method.

8. Let's practice!

Okay! Now that you’ve learned how to handle graph conversion to CSV files, let’s go get some practice!