Get Started

KNN imputation of categorical values

Once all the categorical columns in the DataFrame have been converted to ordinal values, the DataFrame is ready to be imputed. Imputing using statistical models like K-Nearest Neighbors (KNN) provides better imputations.

In this exercise, you'll

  1. Use the KNN() function from fancyimpute to impute the missing values in the ordinally encoded DataFrame users.
  2. Convert the ordinal values back to their respective categories using the ordinal encoder's .inverse_transform() method.

Remember, ordinal_enc_dict stores sklearn's OrdinalEncoder() for each column. The users DataFrame stores the encoded values (ordinal values) for each column.

The KNN() function, the dictionary of OrdinalEncoder()s ordinal_enc_dict and the users DataFrame have already been loaded for you.

This is a part of the course

“Dealing with Missing Data in Python”

View Course

Exercise instructions

  • Impute the users DataFrame using KNN_imputer's fit_transform() method. These transformed values are rounded to get integers.
  • Iterate over columns in users.
  • Select the column's OrdinalEncoder() from ordinal_enc_dict and perform .inverse_transform() on the reshaped array reshaped.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create KNN imputer
KNN_imputer = KNN()

# Impute 'users' DataFrame. It is rounded to get integer values
users_KNN_imputed.iloc[:, :] = np.round(___)

# Loop over the column names in 'users'
for col_name in ___:
    
    # Reshape the column data
    reshaped = users_KNN_imputed[col_name].values.reshape(-1, 1)
    
    # Select the column's Encoder and perform inverse transform on 'reshaped'
    users_KNN_imputed[col_name] = ___

This exercise is part of the course

Dealing with Missing Data in Python

IntermediateSkill Level
4.2+
11 reviews

Learn how to identify, analyze, remove and impute missing data in Python.

Finally, go beyond simple imputation techniques and make the most of your dataset by using advanced imputation techniques that rely on machine learning models, to be able to accurately impute and evaluate your missing data. You will be using methods such as KNN and MICE in order to get the most out of your missing data!

Exercise 1: Imputing using fancyimputeExercise 2: KNN imputationExercise 3: MICE imputationExercise 4: Imputing categorical valuesExercise 5: Ordinal encoding of a categorical columnExercise 6: Ordinal encoding of a DataFrameExercise 7: KNN imputation of categorical values
Exercise 8: Evaluation of different imputation techniquesExercise 9: Analyze the summary of linear modelExercise 10: Comparing and choosing the best adjusted R-squaredExercise 11: Comparing density plotsExercise 12: Conclusion

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free