Get startedGet started for free

Creating your retweet graph

1. Building a graph from raw data

In this lesson we'll be building a graph up from raw twitter data. We'll use our graph to understand what users are most central to the conversation about a given topic, in this case, the rstats hashtag. We'll consider a couple different ways we can construct a graph and then look at who the most influential tweeters are, how one sided the conversations are and find the different communities using this hashtag.

2. Exploring the data

The data set we'll be working with represents a weeks worth of tweets that mention the rstats hashtag. The data comes with lots of metadata about the tweets, but primarily we're interested in screen name of the tweeter and the raw text of the tweet. Let's look at a couple examples.

3. Anatomy of a tweet

You can see we have two different tweets. One is just a tweet about rstats, and the other a retweet. One of the most obvious ways we can construct a graph about the rstats conversation is by simply building one where vertices are screen names, and directed edges are retweets. In this case we would draw a directed link from kom_256 to elenagbg. To do that we're going to need to parse our data set, and tweet by tweet build up our graph.

4. Loading the data

Here we're going to get a sense of what the raw data looks like. The key fields here are the screen name and the tweet text. These are the fields that will let us build our retweet graph. The screen name tells us who tweeted, and the the text tells us two things when we parse it. First that it was a retweet because it starts with the capital letters RT, and second who is being retweeted, in this case Rbloggers. By parsing that text we can build our igraph object.

5. Building the graph

Don't get overwhelmed here! What's going on is actually fairly simple. All we're doing is creating an empty graph, and adding all the screen names as vertices.

6. Building the graph

Next we loop over all the tweet texts. If there was a retweet, we extract the screen name, check and make sure a vertex exists for that screen name, and then add a directed edge!

7. Cleaning the graph

The last thing we'll do is clean our graph up. It's easy to imagine that in some cases a user tweets, but has no interaction. Because we're only studying the graph, we don't want to include these vertices. You can size the problem by counting all the vertices with a degree of 0, and then just deleting those vertices.

8. Let's practice!

Now that you've built this graph of retweets, let's start to analyze it!