NaN value imputation
Let's try to impute some values, using the .transform()
method. In the previous task you created a DataFrame fheroes
where all the groups with insufficient amount of bmi
observations were removed. Our bmi
column has a lot of missing values (NaN
s) though. Given two copies of the fheroes
DataFrame (imp_globmean
and imp_grpmean
), your task is to impute the NaN
s in the bmi
column with the overall mean value and with the mean value per group defined by Publisher
and Alignment
factors, respectively.
Tip: pandas Series and NumPy arrays have a special .fillna()
method which substitutes all the encountered NaN
s with a value specified as an argument.
This exercise is part of the course
Practicing Coding Interview Questions in Python
Exercise instructions
- Define a lambda function that imputes
NaN
values inseries
with its mean. - Impute
NaN
s in thebmi
column ofimp_globmean
with the overall mean value. - Impute
NaN
s in thebmi
column ofimp_grpmean
with the mean value per group.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Define a lambda function that imputes NaN values in series
impute = lambda series: ____
# Impute NaNs in the bmi column of imp_globmean
imp_globmean['bmi'] = ____
print("Global mean = " + str(fheroes['bmi'].mean()) + "\n")
groups = imp_grpmean.groupby(['Publisher', 'Alignment'])
# Impute NaNs in the bmi column of imp_grpmean
imp_grpmean['bmi'] = groups[____].____
print(groups['bmi'].mean())