Get startedGet started for free

Testing for normality

A powerful suite of statistical tools, which includes several common hypothesis tests, depends on the assumption that the underlying data is normally distributed. While a histogram can hint at whether the data is approximately normally distributed, various hypothesis tests allow us to test this assumption directly. Moreover, histograms can be very sensitive to the number of bins, especially when the sample sizes are small.

In this exercise you'll work with salary data from employees for the City of Austin in salary_df. In particular you will be working with Hispanic firefighters. You'll analyze if their years of employment are approximately normally distributed using the Anderson-Darling hypothesis test.

This exercise is part of the course

Foundations of Inference in Python

View Course

Exercise instructions

  • Plot a histogram showing the Years of Employment for the employees.
  • Conduct an Anderson-Darling test for normality to see if Years of Employment is approximately normally distributed.
  • Find which critical_values the test statistic is greater than.
  • Print the significance_level(s) at which the null hypothesis would be rejected.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Plot a histogram of the employees' "Years of Employment"
____.plot(kind="____")
plt.show()

# Conduct an Anderson-Darling test using the years of employment from salary_df
result = stats.____(____)

# Print which critical values the test statistic is greater than the critical values
print(result.____ > result.____)

# Print the significance levels at which the null hypothesis is rejected
print(result.____[result.____ > result.____])
Edit and Run Code