Exercise

Sampling from the best continuous distribution

Random sampling from a well-fitting probability distribution helps maintain privacy. At the same time, it allows authorized parties to conduct an accurate statistical analysis of the data.

In this exercise, you will anonymize the column monthly_income from the IBM dataset. In the previous lesson, you determined the exponnorm continuous distribution to be the best fit. Use it to model the incomes.

The dataset is available as hr.

Instructions

100 XP
  • Import the stats module from the scipy package.
  • Fit the exponnorm distribution to the continuous variable monthly_income to obtain the parameters of the distribution and later generate the samples.
  • Sample from the exponnorm distribution and replace monthly_income using the .rvs() method. Specify the size to be the same as the length of the column.
  • Round the salaries to their closest integer.