1. Handling missingness
One of the most common challenges when working with time series data is the problem of missingness.
2. Missingness
If our data contains a large number of missing values, it can be difficult to identify trends in the data and perhaps impossible to conduct statistical analysis.
Luckily, xts and zoo offer a handful of commands to help fill in missing data with relevant values.
3. Fill NAs with last observation
The most common method for handling missing data is to use the last observation carried forward, or LOCF. This approach fills missing values with the most recent value available in our data.
For example, lets say we have annual time series data through the 1980s, but we're missing observations for 1985 through 1987.
The na-dot-locf() command will replace the NA's for these years with the last observation carried forward, in this case the data from 1984.
4. Fill NAs with next observation
A similar technique takes the opposite approach by using the next observation carried backward, or NOCB.
Instead of using the data from 1984, this technique takes our existing data from 1988 and projects it backward.
5. Linear interpolation
In some cases, it may make more sense to calculate a value somewhere in between your existing data points. This approach, known as linear interpolation, allows you to calculate a new value between your data points weighted in time.
In our example, the interpolated value for 1985 will be one-quarter of the way between our existing data from 1984 and our existing data from 1988. The interpolated value for 1986 will be one-half of the way between our existing data points.
Which technique is best depends on the type of data you're using and the trends you expect to find.
6. Let's practice!
In the next few exercises, you'll address the problem of missingness using economic data from the United States.
Let's practice!