Testing for normality
A powerful suite of statistical tools, which includes several common hypothesis tests, depends on the assumption that the underlying data is normally distributed. While a histogram can hint at whether the data is approximately normally distributed, various hypothesis tests allow us to test this assumption directly. Moreover, histograms can be very sensitive to the number of bins, especially when the sample sizes are small.
In this exercise you'll work with salary data from employees for the City of Austin in salary_df
. In particular you will be working with Hispanic firefighters. You'll analyze if their years of employment are approximately normally distributed using the Anderson-Darling hypothesis test.
This exercise is part of the course
Foundations of Inference in Python
Exercise instructions
- Plot a histogram showing the
Years of Employment
for the employees. - Conduct an Anderson-Darling test for normality to see if
Years of Employment
is approximately normally distributed. - Find which
critical_values
the teststatistic
is greater than. - Print the
significance_level
(s) at which the null hypothesis would be rejected.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Plot a histogram of the employees' "Years of Employment"
____.plot(kind="____")
plt.show()
# Conduct an Anderson-Darling test using the years of employment from salary_df
result = stats.____(____)
# Print which critical values the test statistic is greater than the critical values
print(result.____ > result.____)
# Print the significance levels at which the null hypothesis is rejected
print(result.____[result.____ > result.____])