Exercise

Imputing missing values with percentiles

In this exercise, you'll continue to practice imputing missing values. Unlike the previous exercise, however, you will use percentiles in place of averages to compute the imputations. Using percentiles is a great way to get conservative imputations. Imputing missing values in a column using percentiles involves the following underlying steps:

  • Remove the missing values from the column of interest.
  • Then compute the, say 70th percentile of the numbers from the column you just removed missing values from.
  • 70th percentile worst value depends on the column you compute the percentile from:
    • For instance, having a large amount of assets is considered to be a good thing, so a low amount of assets is worse. The 70th percentile worst value of assets is actually just the 30th percentile of assets.
    • Analogously, high amounts of liabilities is considered bad. So a 70th worst value of liabilities is simply its 70th percentile.

pandas has been loaded with the alias pd and NumPy has been loaded with the alias np. A pandas DataFrame called dataset has been loaded for you. It has the column "Total Current Liabilities", which has some missing values.

Instructions 1/2

undefined XP
    1
    2
  • Impute the missing values in "Total Current Liabilities" by "company" using the 70th percentile non-missing value.
  • Impute the missing values in "Total Current Liabilities" by "comp_type" using the 70th percentile non-missing value.