Missing values
Sometimes there are missing values in time series data, denoted NA
in R, and it is useful to know their locations. It is also important to know how missing values are handled by various R functions. Sometimes we may want to ignore any missingness, but other times we may wish to impute or estimate the missing values.
Let's again consider the monthly AirPassengers
dataset, but now the data for the year 1956 are missing. In this exercise, you'll explore the implications of this missing data and impute some new data to solve the problem.
The mean()
function calculates the sample mean, but it fails in the presence of any NA
values. Use mean(___, na.rm = TRUE)
to calculate the mean with all missing values removed. It is common to replace missing values with the mean of the observed values. Does this simple data imputation scheme appear adequate when applied the the AirPassengers
dataset?
This exercise is part of the course
Time Series Analysis in R
Exercise instructions
- Use
plot()
to display a simple plot ofAirPassengers
. Note the missing data for 1956. - Use
mean()
to calculate the sample mean ofAirPassengers
with the missing data removed (na.rm = TRUE
). - Run the pre-written code to impute the mean values into your missing data.
- Use another call to
plot()
to replot your newly imputedAirPassengers
data. - Run the pre-written code to add the complete
AirPassengers
data to your plot.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Plot the AirPassengers data
plot(___)
# Compute the mean of AirPassengers
# Impute mean values to NA in AirPassengers
AirPassengers[85:96] <- mean(AirPassengers, na.rm = ___)
# Generate another plot of AirPassengers
# Add the complete AirPassengers data to your plot
rm(AirPassengers)
points(AirPassengers, type = "l", col = 2, lty = 3)