Get startedGet started for free

Plotting twitter data over time

1. Plotting twitter data over time

In this lesson, you will learn to analyze twitter data over time. We do this analysis to detect changing trends and understand interest levels on a product or brand.

2. Lesson overview

This lesson will cover: what time series data is, creating time series objects and plots, visualizing tweet frequency over time, and comparing brand salience of two brands. Brand salience is the extent to which a brand is spoken about for which volume of tweets is a strong indicator.

3. Time series data

Time series represents a series of data points sequentially indexed over time. Analyzing time series data helps visualize the frequency of tweets over time.

4. Extracting tweets for time series analysis

For time series analysis, we extract tweets using search_tweets() from the rtweet library. The search_tweets() function takes 3 arguments: the query to search, number of tweets to return, and whether to include or exclude retweets.

5. Extracted tweet data

We see that the created_at column has the timestamp of the tweets in the output.

6. Visualize frequency of tweets

Twitter data can help monitor engagement for a product, indicating levels of interest. Visualizing tweet frequency provides insights into this interest level.

7. Visualize tweet frequency

Let's visualize tweet frequency on the automobile brand "Camry". We start by extracting tweets on "hashtag camry" using search_tweets() and assigning it to a data frame.

8. Visualize tweet frequency

A few rows of the extracted tweets are shown here.

9. Create time series plot

To create a time series plot, use ts_plot() which takes 3 arguments: the tweets data frame, the by argument to specify the time interval, and the color of the line plot. We can see that the tweet activity on Camry is consistently low except for a couple of spikes.

10. Compare frequency of tweets

The volume of tweets posted for a product is a strong indicator of its brand salience. Let's compare the brand salience of Tesla and Camry.

11. Compare frequency of tweets

Convert the tweets extracted on Camry into a time series object using the ts_data() function. A time series object contains the aggregated frequency of tweets over a specified time interval. ts_data() takes 2 arguments: the tweets data frame and the desired time interval. The output has two columns comprising time and tweet frequencies.

12. Compare frequency of tweets

Next, rename the columns to time and camry_n using the names() function.

13. Compare frequency of tweets

Let's repeat the same steps to create another time series object for "Tesla". We now have two time series objects with columns for time and tweet frequencies.

14. Compare frequency of tweets

We merge the objects into a single data frame using the merge() function which takes 3 arguments: the time series objects to be merged, the by argument which specifies the common column for merging, and the all argument to instruct whether all the rows should be included.

15. Compare frequency of tweets

Next, use the melt() function to stack frequency counts in one column and brands in another column. The melt() function takes 3 arguments: the data frame to melt, na.rm to specify whether to include or exclude rows with missing values and id.vars to specify the source columns to be retained. All other columns get stacked and the output has 3 columns: time, variable, and value.

16. Compare frequency of tweets

The last step is to plot the frequency of tweets on Camry and Tesla using ggplot(). Here, the melted data frame is the first argument. Under aesthetics, we input the relevant column names as values for the x-axis, y-axis, and color of the plot. The geom_line() line width is set to 0.8.

17. The comparison plot

It's interesting to see that there are relatively more tweets on Tesla than on Camry. This indicates a higher brand salience for Tesla than Camry.

18. Let's practice!

Now that we've seen how to analyze tweet frequency over time and compare the same for two brands using time series visualization, let's practice.