1. Analyzing twitter data
Welcome to the course on analyzing social media data with R.
I am Sowmya Vivek, a data science coach and consultant in analytics and NLP.
2. Course Overview
In this course, you will learn to extract and visualize twitter data, analyze tweet text, perform network analysis, and view tweets on the map.
We will explore tweets on celebrities, brands, hot topics, and sports.
Let's get started with understanding the need for analyzing twitter data and the pros and cons of using twitter data.
3. Introduction to social media analysis
Social media analysis is the process of collecting data from social media websites and analyzing this data to derive insights for better business decisions.
4. About Twitter
Twitter is a popular social media platform where people communicate via short messages called tweets.
It is popular for micro-blogging and has a huge amount of information available in the form of tweets and the metadata around the tweets.
5. Power of twitter data
How powerful is twitter data in terms of available information?
Let's look at some facts and figures:
With 500 million tweets sent each day
6. Power of twitter data
and 330 million people tweeting every month, the information available for analysis is enormous.
7. Power of twitter data
According to Twitter, 80% of its users are affluent millennials.
8. Power of twitter data
Ads on twitter during live events are 11% more effective on audience engagement than TV ads.
9. Power of twitter data
40% of users say that they have made a purchase because of influencers' tweets. All these facts provide a strong motive for using twitter data for analysis.
10. Volume of tweets
To demonstrate the volume and velocity of tweets, we will look at a simple example.
Many functions are available in R to extract tweets for analysis and some of these will be covered in the forthcoming lessons.
One such function stream_tweets() samples 1% of all publicly available live tweets for a 30-second window by default.
11. Volume of tweets
In this example, the live tweets extracted using stream_tweets() are saved in a data frame.
Upon viewing the dimensions, we see that 1047 live tweets were extracted. This is just a 1% random sample for a 30-second window which indicates the magnitude and velocity of tweets posted.
Also, there are 90 columns providing rich information about each tweet.
12. Volume of tweets
We can extract live tweets using the same function and specify a time window of 60 seconds under the timeout argument.
You can see that the number of live tweets extracted has more than doubled to 3464 now.
13. Applications of twitter data
Twitter data can be used for a wide range of applications such as understanding current topics trending across the world,
14. Applications of twitter data
evaluating customer opinion about a brand,
15. Applications of twitter data
analyzing the public sentiment of a political party, leader, or an event,
16. Applications of twitter data
visualizing reach of a movie, brand, or personality, and
17. Applications of twitter data
detecting events like an epidemic or a protest.
18. Advantages of twitter data
The biggest advantage of using twitter for social media analysis is that the Twitter API is more open and accessible compared to that of other social media platforms.
It is easier to find and follow conversations on twitter because of the hashtag norms.
Since the length of tweets is limited, running algorithms is easy and controlled.
19. Limitations of twitter data
Let's look at the limitations of twitter data.
Twitter limits the historical search for a free account.
There are also limitations on the number of tweets that can be extracted for a free account.
The tweets extracted are a 1% sample of all the tweets and so may not be an accurate representation.
Besides, only a very small percentage of tweets are accurately tagged for geographic location.
20. Let's practice!
Let's practice what we learned before we do a deep dive on extracting and analyzing twitter data.