Session Ready
Exercise

Estimating missing values with missMDA

As you saw in the video, In R, there are two packages for conducting PCA to a dataset with missing values; pcaMethods and missMDA. In this exercise, you are going to use the first method introduced in the video: combining missMDA and FactoMineR. Both packages are loaded for you in this exercise.

The two-step procedure includes a) the estimation of missing values by using an iterative PCA algorithm in the first place and b) number of dimensions for PCA by cross-validation.

In this exercise, you will be working with the ozone dataset of the missMDA package that includes 112 daily measurements of meteorological variables (wind speed, temperature, rainfall, etc.) and ozone concentration recorded in Rennes (France) during the summer 2001.

Instructions
100 XP
  • Find the number of cells with missing values in the first 11 continuous variables of the ozone dataset.
  • Estimate the number of dimensions used for imputing the data.
  • The last step is to use the estimated number stored in ozone_ncp in order to actually conduct the data imputation on ozone[,1:11].