Exercise

Missing values

Sometimes there are missing values in time series data, denoted NA in R, and it is useful to know their locations. It is also important to know how missing values are handled by various R functions. Sometimes we may want to ignore any missingness, but other times we may wish to impute or estimate the missing values.

Let's again consider the monthly AirPassengers dataset, but now the data for the year 1956 are missing. In this exercise, you'll explore the implications of this missing data and impute some new data to solve the problem.

The mean() function calculates the sample mean, but it fails in the presence of any NA values. Use mean(___, na.rm = TRUE) to calculate the mean with all missing values removed. It is common to replace missing values with the mean of the observed values. Does this simple data imputation scheme appear adequate when applied the the AirPassengers dataset?

Instructions

100 XP
  • Use plot() to display a simple plot of AirPassengers. Note the missing data for 1956.
  • Use mean() to calculate the sample mean of AirPassengers with the missing data removed (na.rm = TRUE).
  • Run the pre-written code to impute the mean values into your missing data.
  • Use another call to plot() to replot your newly imputed AirPassengers data.
  • Run the pre-written code to add the complete AirPassengers data to your plot.