Get Started

MICE imputation

The fancyimpute package offers various robust machine learning models for imputing missing values. You can explore the complete list of imputers from the detailed documentation. Here, we will use IterativeImputer or popularly called MICE for imputing missing values.

The IterativeImputer performs multiple regressions on random samples of the data and aggregates for imputing the missing values. You will use the diabetes DataFrame for performing this imputation.

This is a part of the course

“Dealing with Missing Data in Python”

View Course

Exercise instructions

  • Import IterativeImputer from fancyimpute.
  • Copy diabetes to diabetes_mice_imputed.
  • Create an IterativeImputer() object and assign it to mice_imputer.
  • Impute the diabetes_mice_imputed DataFrame.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import IterativeImputer from fancyimpute
___

# Copy diabetes to diabetes_mice_imputed
diabetes_mice_imputed = ___

# Initialize IterativeImputer
mice_imputer = ___

# Impute using fit_tranform on diabetes_mice_imputed
diabetes_mice_imputed.iloc[:, :] = ___

This exercise is part of the course

Dealing with Missing Data in Python

IntermediateSkill Level
4.2+
11 reviews

Learn how to identify, analyze, remove and impute missing data in Python.

Finally, go beyond simple imputation techniques and make the most of your dataset by using advanced imputation techniques that rely on machine learning models, to be able to accurately impute and evaluate your missing data. You will be using methods such as KNN and MICE in order to get the most out of your missing data!

Exercise 1: Imputing using fancyimputeExercise 2: KNN imputationExercise 3: MICE imputation
Exercise 4: Imputing categorical valuesExercise 5: Ordinal encoding of a categorical columnExercise 6: Ordinal encoding of a DataFrameExercise 7: KNN imputation of categorical valuesExercise 8: Evaluation of different imputation techniquesExercise 9: Analyze the summary of linear modelExercise 10: Comparing and choosing the best adjusted R-squaredExercise 11: Comparing density plotsExercise 12: Conclusion

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free