Using hierarchies for categorical data
In this exercise, you will create and use hierarchies to apply data generalization on the bachelors
column of the US Adult Income dataset.
An initial dictionary containing the hierarchies is available for you as hierarchies
. It holds three categories for the education types: Primary
, Secondary
and Higher
; each has a list of the data's corresponding education values. Feel free to explore it in the interactive console.
We will create a new dictionary that will hold the generalized education information and use to replace the original values.
The dataset is available as income_df
.
Cet exercice fait partie du cours
Data Privacy and Anonymization in Python
Instructions
- Initialize the
education_hierarchy
as an empty dictionary. - Complete the inner loop to assign the education type
key
as the value. For example{'Some-college': 'Higher education'}
. - Apply education hierarchy generalization to the
bachelors
column, assigning the result to the new columnbachelors_generalized
.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Initialize dictionary for each education category value
education_hierarchy = ____
# Create hierachy for each of the education category values
for (key,education_values) in hierarchies.items():
for education in education_values:
education_hierarchy[education] = ____
# Apply education_hierarchy generalization to bachelors
income_df['bachelors_generalized'] = ____
# See resulting dataset
print(income_df.head())