Get startedGet started for free

Resampling and aggregating observations

1. Resampling and aggregating observations

Great work!

2. Sampling frequency

As a reminder, the sampling frequency of a time series represents the number of observations per year. The sampling frequency gives us a method to describe the units of time between observations, such as weekly, daily, and so on. Sampling frequency can be considered the "temporal resolution" of the data; the shorter the intervals between observations, the higher our resolution. For example, sampling data every hour, minute, or second would have a relatively high temporal resolution, while data sampled every month, quarter, or year would have a relatively low temporal resolution. Note that high and low temporal resolution are relative; geological analysis would consider yearly data high resolution, while a particle physicist might consider minutes and seconds as low resolution.

3. Aggregation

In time series analysis, a process known as 'aggregation' involves taking data of a higher resolution and resampling it to a lower resolution. Aggregation works by taking a statistical function, such as the sum, mean, maximum, etc, and applying it to the data at a certain resolution. For example, we can take data sampled daily and aggregate it to determine the monthly total of observations, or the weekly average of values. Because aggregation uses summary functions like mean, sum, max, etc, this process only works in one direction. If we had a monthly total of sales data, we would not be able to extract the original daily values. Aggregation, therefore, provides descriptions of the patterns in the data, at the cost of reducing the information available to us!

4. Aggregating data with xts

The most useful R functions designed to aggregate time series data come from the xts package. xts, or "eXtensible Time Series", is designed to extend the zoo package, and provides many useful functions and workflows for manipulating time series data. Let's consider the xts functions that aggregate data to different periods of time, or temporal resolutions. These are the apply-dot functions, and they work by taking a time series dataset and a summary function. For example, apply-dot-yearly aggregates data to the yearly level. We set the x argument to the time series, here it's maunaloa, and FUN is the function we wish to apply to the values within each year, here, mean.

5. Aggregating data with xts

For reference, here's the plot of the original, unaggregated data. And here's the aggregated! Although we lose some information, like how the CO2 concentration changes within each year, we now can visualize the general trend in the data. This kind of aggregation is useful when we wish to keep our visuals from being too cluttered.

6. apply-dot functions

xts has many of these apply-dot functions, such as apply daily, apply weekly apply monthly, and so on! The apply-dot functions make aggregation a breeze!

7. Endpoints and period.apply

Behind the scenes, the apply-dot functions work by using two other xts functions: endpoints and period-dot-apply. Using these, we can aggregate our data to any arbitrary level, such as intervals of four days, or three weeks, or seventeen minutes, and so on! Here's an example. To use endpoints, we input a time series, set on to the units we want, and k to the number of those units. Here, we use biweekly to mean every two weeks. The output is a vector of the indices of the daily_data time series, which occur every two weeks. Then, we use the period apply function, by setting x to our time series, INDEX to the endpoints we created, and set FUN to the summary function we want to use.

8. Let's practice!

Head over to the exercises and aggregate some data!