Explore the distribution of data
When we want to anonymize a dataset by sampling data in a very realistic way, we need to acquire some domain and statistical knowledge of the data. As we have seen, finding the probability distribution of the column of interest is key.
In this exercise, you will explore the column business_travel
from a simplified version of the IBM HR dataset.
The DataFrame has been imported as hr
and numpy
as np
. As said in the previous chapter, pandas
has been imported as pd
for this and the rest of the course.
Diese Übung ist Teil des Kurses
Data Privacy and Anonymization in Python
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Print the absolute frequencies of each unique value
print(____)