Imputing missing values with percentiles
In this exercise, you'll continue to practice imputing missing values. Unlike the previous exercise, however, you will use percentiles in place of averages to compute the imputations. Using percentiles is a great way to get conservative imputations. Imputing missing values in a column using percentiles involves the following underlying steps:
- Remove the missing values from the column of interest.
- Then compute the, say 70th percentile of the numbers from the column you just removed missing values from.
- 70th percentile worst value depends on the column you compute the percentile from:
- For instance, having a large amount of assets is considered to be a good thing, so a low amount of assets is worse. The 70th percentile worst value of assets is actually just the 30th percentile of assets.
- Analogously, high amounts of liabilities is considered bad. So a 70th worst value of liabilities is simply its 70th percentile.
pandas has been loaded with the alias pd
and NumPy has been loaded with the alias np
. A pandas
DataFrame called dataset
has been loaded for you. It has the column "Total Current Liabilities"
, which has some missing values.
This exercise is part of the course
Analyzing Financial Statements in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Impute missing value with 70th percentile non-missing values of company
impute_by_company = ___
# Impute missing value with 70th percentile non-missing values of industry
impute_by_comp_type = ____
print(impute_by_company)
print(impute_by_comp_type)