Get startedGet started for free

Missing values

Sometimes there are missing values in time series data, denoted NA in R, and it is useful to know their locations. It is also important to know how missing values are handled by various R functions. Sometimes we may want to ignore any missingness, but other times we may wish to impute or estimate the missing values.

Let's again consider the monthly AirPassengers dataset, but now the data for the year 1956 are missing. In this exercise, you'll explore the implications of this missing data and impute some new data to solve the problem.

The mean() function calculates the sample mean, but it fails in the presence of any NA values. Use mean(___, na.rm = TRUE) to calculate the mean with all missing values removed. It is common to replace missing values with the mean of the observed values. Does this simple data imputation scheme appear adequate when applied the the AirPassengers dataset?

This exercise is part of the course

Time Series Analysis in R

View Course

Exercise instructions

  • Use plot() to display a simple plot of AirPassengers. Note the missing data for 1956.
  • Use mean() to calculate the sample mean of AirPassengers with the missing data removed (na.rm = TRUE).
  • Run the pre-written code to impute the mean values into your missing data.
  • Use another call to plot() to replot your newly imputed AirPassengers data.
  • Run the pre-written code to add the complete AirPassengers data to your plot.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Plot the AirPassengers data
plot(___)

# Compute the mean of AirPassengers


# Impute mean values to NA in AirPassengers
AirPassengers[85:96] <- mean(AirPassengers, na.rm = ___)

# Generate another plot of AirPassengers


# Add the complete AirPassengers data to your plot
rm(AirPassengers)
points(AirPassengers, type = "l", col = 2, lty = 3)
Edit and Run Code