Extracting twitter data
1. Extracting twitter data
We learned about the power of twitter in the previous lesson. Let's learn how to set up the R environment to access twitter and extract data.2. Lesson Overview
In this lesson, we will get to know the fundamentals of API and the types of twitter API. We will learn to set up the R environment to interact with twitter and extract data.3. API explained
API stands for Application Programming Interface. It is a software intermediary that allows two applications to talk to each other so that they can request and deliver information. Twitter APIs interact with twitter to access tweets and tweet attributes.4. API-based subscriptions
There are different twitter API subscription levels targeted at various end-users. The standard API is free and provides basic queries for searching and streaming tweets from the past 7 days.5. API-based subscriptions
The premium and enterprise APIs follow paid subscription models and provide access to the last 30 days or to the full archive of tweets. For our course, we will work with the standard API. The concepts we learn can be applied to the paid subscription models too.6. Prerequisites to set up R
Before proceeding to set up the R environment on your computer, you need to take care of some prerequisites. These prerequisites are having a twitter account, disabling the pop-up blocker in the web browser, opening an interactive R session, and installing the packages rtweet and httpuv in R. Please note that all these prerequisites have been set up within the DataCamp interface already for the purpose of learning this course and no action is required at your end.7. The rtweet and httpuv packages
rtweet is a powerful R package having several functions to extract and convert twitter data to data frames. The httpuv package helps authenticate twitter API access via a web browser. It is a building block for other R packages.8. Setting up the R environment
To set up the R environment in your computer, the following steps need to be performed as a one-time setup. Activating the rtweet and the httpuv libraries, using the search_tweets() function with a search query to connect to twitter for the first time, and authorizing twitter access via a web browser pop-up. The message "Authentication complete" confirms authorization of twitter access. Note that the R environment in the DataCamp interface has already been set up and no action is required at your end for the purpose of learning this course.9. Extract twitter data: search_tweets()
Let us extract our first set of tweets using the search_tweets() function. This function returns twitter data, matching a user-provided search query, from the past 7 days. The maximum number of tweets returned from a single request is 18,000. We input the following arguments for search_tweets(): the query to search, the number of tweets to return, whether to include or exclude retweets and the language used for the tweet.10. Extract twitter data: search_tweets()
The output here shows the first few rows of tweets extracted on "Game of Thrones".11. Extract twitter data: get_timeline()
get_timeline() is another function from rtweet that returns up to 3200 tweets posted by a specified twitter user. Any input above 3200 is ignored and the function returns a maximum of 3200 tweets. We input the following arguments for get_timeline(): the username whose timeline is to be accessed and the number of tweets to return.12. Extract twitter data: get_timeline()
The output shows a few rows of tweets posted by Katy Perry on her timeline.13. Let's practice!
Congrats! You have learned to set up the R environment for authorizing twitter access and to extract tweets. Let's practice!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.