Get startedGet started for free

Time series

1. Time series

Now you know how to check how many tweets mention a word or phrase in a Twitter dataset. We're going to build on this by illustrating how mentions of keywords change over time.

2. Time series data

Tweets about companies, products, and political issues vary by the day, hour, minute, and even to the second. We want to be able to capture that variation over time. When data is tagged with a date and time, this is known as "time series" or "over time" data. The structure of time series data typically contains a date or datetime, and some type of numerical measure. In the data frame above, we have a datetime, a count of mentions of a political candidate in time period, and the name of that political candidate.

3. Converting datetimes

We can convert the date information from a string to a `datetime` type. Pandas is smart enough to convert the Twitter format for the date -- stored in `created_at` -- to a datetime type with the `to_datetime` method. Next, we set the index of the DataFrame to the `created_at` column using the `set_index` method. This allows us to access a number of useful time series methods.

4. Keywords as time series metrics

Now that we have our data frame in a time series format, we need to produce a metric which can be graphed over time. Our function `check_word_in_tweet` returns a boolean Series which indicates which rows contain our keyword and which do not. Remember that the boolean value True is the same as the numerical value one. Therefore, we can produce a column for each keyword we're interested in and understand its prevalence over time. If we sum up this column, we get an overall count of how many times that keyword appears.

5. Generating keyword means

Now that we have a metric, we can now begin plotting the keyword over time. We first generate a summary statistic over the metric we're interested in. We can use Series method `resample` for this purpose. `resample` allows us to summarize over a time window of our choice and apply a function to it. We'll use `resample` with the `mean` method to generate averages over one-minute windows. The averages are measured as a proportion of all the tweets within the window.

6. Plotting keyword means

Lastly, we'll plot those keyword means over time. We import `matplotlib-dot-pyplot`, then we use `plt-dot-plot` to create the plot. On the x-axis, we'll use the minute index and on the y-axis, we'll use the generated mean. We'll color Facebook blue and Google green. In this dataset, we see that mentions of Facebook seem generally higher over time compared to mentions of Google.

7. Let's practice!

Now it's your turn. In the following exercises, you'll be plotting keyword prevalence across time.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.