Exercise

Using hierarchies for categorical data

In this exercise, you will create and use hierarchies to apply data generalization on the bachelors column of the US Adult Income dataset.

An initial dictionary containing the hierarchies is available for you as hierarchies. It holds three categories for the education types: Primary, Secondary and Higher; each has a list of the data's corresponding education values. Feel free to explore it in the interactive console.

We will create a new dictionary that will hold the generalized education information and use to replace the original values.

The dataset is available as income_df.

Instructions

100 XP
  • Initialize the education_hierarchy as an empty dictionary.
  • Complete the inner loop to assign the education type key as the value. For example {'Some-college': 'Higher education'}.
  • Apply education hierarchy generalization to the bachelors column, assigning the result to the new column bachelors_generalized.