Session Ready
Exercise

Imputing missing data

In this exercise, we will use GLRM to impute missing data. We are going to build a GLRM model from a dataset named fashion_mnist_miss, where 20% of values are missing. The goal is to fill these values by making a prediction using h2o.predict() with the GLRM model.

In this exercise an h2o instance is already running, so it is not necessary to call h2o.init().

The h2o package and fashion_mnist_miss have been loaded.

Instructions
100 XP
  • Store the input data in the h2o cluster.
  • Execute a rank-2 GLRM model on normalized input data and limit the number of iterations to 100.
  • Impute missing data by predicting with the previous model.
  • Inspect the basic statistics of the first five columns in the console.