1. Plot aggregates of your data
The pandas library offers additional functionality to generate and plot interesting aggregates of your data. In the following exercises, we will look at some of the most common techniques used to display alternative aspects of your time series data such as rolling means and aggregated values.
2. Moving averages
A moving average, also known as rolling mean, is a commonly used technique in the field of time series analysis. It can be used to smooth out short-term fluctuations, remove outliers, and highlight long-term trends or cycles. Taking the rolling mean of your time series is equivalent to "smoothing" your time series data. In pandas, the dot rolling() method allows you to specify the number of data points to use when computing your metrics.
3. The moving average model
Here, you specify a sliding window of 52 points and compute the mean of those 52 points as the window moves along the date axis. The number of points to use when computing moving averages depends on the application, and these parameters are usually set through trial and error or according to some seasonality. For example, you could take the rolling mean of daily data and specify a window of 7 to obtain weekly moving averages. In our case, we are working with weekly data so we specified a window of 52 (because there are 52 weeks in a year) in order to capture the yearly rolling mean.
4. A plot of the moving average for the CO2 data
This is your yearly rolling mean.
5. Computing aggregate values of your time series
Another useful technique to visualize time series data is to take aggregates of the values in your data. For example, the co2_levels data contains weekly data, but you may wish to see how these values behave by month of the year. Because you have set the index of your co2_levels DataFrame as a datetime type, it is possible to directly extract the day, month or year of each date in the index. For example, you can extract the month using the command co2_levels dot index dot month. Similarly, you can extract the year using the command co2_levels dot index dot year.
6. Plotting aggregate values of your time series
Aggregating values in a time series can help answer questions such as "what is the mean value of our time series on Sundays", or "what is the mean value of our time series during each month of the year". If the index of your pandas DataFrame consists of datetime types, then you can extract the indices and group your data by these values. Here, you use the dot groupby() and dot mean() methods to compute the monthly averages of the CO2 levels data and assign that to a new variable called co2_levels_by_month.The dot groupby() method allows you to group records into buckets based on a set of defined categories. In this case, the categories are the different months of the year.
7. Plotting aggregate values of your time series
When we plot co2_levels_by_month , we see that the monthly mean value of CO2 levels peaks during the 5th to 7th months of the year. This is consistent with the fact that during summer we see increased sunlight and CO2 emissions from the environment. I really like this example, as it shows the power of plotting aggregated values of time series data.
8. Let's practice!
Ok! Let's work on displaying aggregated information about your time series data!