Fitting a normal distribution
When working with relatively small data sets you often don't have enough data to make principled inference. However, if you suspect the data follows a normal distribution, it may be reasonable to fit a normal distribution and work with this, rather than with the raw data. In this exercise you will work the same data on Hispanic firefighters which you previously showed was normally distributed at the 5% level. You will fit a normal distribution to it, and use this to find the percentage of these employees we would generally expect to have less than 10 years of experience.
This DataFrame has been loaded for you in salary_df
. The packages pandas as pd
, NumPy as np
, Matplotlib as plt
, and the stats
package from SciPy have all been loaded for you.
This exercise is part of the course
Foundations of Inference in Python
Exercise instructions
- Fit a normal distribution to the
Years of Employment
column and save the resulting mean and standard deviation. - Use this mean and standard deviation in a normal CDF to estimate the percentage of employees with less than ten years of experience.
- Print out this percentage.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Fit a normal distribution to the data
mu, std = ____
# Compute the percentage of employees with less than 10 years experience
percent = stats.____(____, loc=____, scale=____)
# Print out this percentage
____