1. Twitter network analysis
Network analysis of twitter data is a great way to decipher interdependencies between twitter users.
2. Lesson overview
In this lesson, we will understand the concepts of networks and how these concepts can be applied to social media.
We will also create a retweet network for an interesting topic.
3. Network and network analysis
A network is a system of interconnected objects.
Two classic examples of networks are local area network of computers and social media.
4. Network and network analysis
Network analysis is the process of mapping network objects to understand inter-dependencies and information flow within the network.
5. Components of a network
The two main components of a network are the nodes and edges.
The objects that are interconnected in a network are called nodes or vertices.
Every user in a twitter network is called a node or vertex.
6. Components of a network
The connections between these objects are called edges.
How the users are connected determine the edges.
7. Directed vs undirected network
There are two broad types of networks based on the information flow: directed and undirected networks.
When the edges of a network point towards one direction, the network is called a directed network. Here, the relationship between nodes only works in one direction.
8. Directed vs undirected network
When the edges of a network have no direction, the network is an undirected network. The relationship between nodes works in both directions in such networks.
9. Applications in social media
Twitter users tweet, like, follow, and retweet creating complex network structures.
Analyzing the structure and size of networks facilitates identifying key players and influencers who are pivotal to transmitting information to a wide audience.
10. Retweet network
A retweet network is a network of twitter users who retweet tweets posted by other users.
It is a directed network where the source vertex is the user who retweets and the target vertex is the user who posted the original tweet.
Understanding the position of potential customers on a retweet network allows a brand to identify key players who are likely to retweet posts to spread brand messaging.
11. Retweet network of #OOTD
Let's create a retweet network of users who retweet on hashtag OOTD which means "Outfit Of The Day".
This hashtag is popular amongst young users for flaunting their outfits and can be used by fashion brands to grab the attention of potential customers.
12. Create the tweet data frame
First, we extract 18000 tweets on hashtag OOTD using search_tweets() and include retweets.
13. Create data frame for the network
Next, we create a subset dataframe of screen_name and retweet_screen_name from the extracted tweets.
For the retweet network, the source vertex is the screen_name and the target vertex is the retweet_screen_name.
As some rows have NA values under retweet_screen_name, we will exclude these rows before proceeding further.
14. Include only retweets in the data frame
To remove rows with NA values, we use the complete.cases() function.
This function takes the data frame as input and retains only rows without NA values.
15. Convert data frame to a matrix
To create a network, we need the contents saved as a matrix.
The as.matrix() function converts the data frame to a matrix.
16. Create the retweet network
We are now ready to create the retweet network using graph_from_edgelist() from the igraph library.
This function takes two arguments:
the edge list, el, set to the matrix and directed set to TRUE for the directed network.
17. View the retweet network
We use the print.igraph() function to view the retweet network.
18. View the retweet network
Here, DN indicates that it is a directed network. The number of edges and vertices are 4100 and 4616 respectively.
The source and target vertex names can be seen separated by arrows.
For example, "MaikielYungin" is the source vertex and "ZingletC" is the target vertex in the first row.
We have now successfully created a retweet network. We will identify key players from this network using network centrality measures in the next lesson.
19. Let's practice!
Let's practice by creating a retweet network on the topic hashtag travel!