CommencerCommencer gratuitement

Testing for normality

A powerful suite of statistical tools, which includes several common hypothesis tests, depends on the assumption that the underlying data is normally distributed. While a histogram can hint at whether the data is approximately normally distributed, various hypothesis tests allow us to test this assumption directly. Moreover, histograms can be very sensitive to the number of bins, especially when the sample sizes are small.

In this exercise you'll work with salary data from employees for the City of Austin in salary_df. In particular you will be working with Hispanic firefighters. You'll analyze if their years of employment are approximately normally distributed using the Anderson-Darling hypothesis test.

Cet exercice fait partie du cours

Foundations of Inference in Python

Afficher le cours

Instructions

  • Plot a histogram showing the Years of Employment for the employees.
  • Conduct an Anderson-Darling test for normality to see if Years of Employment is approximately normally distributed.
  • Find which critical_values the test statistic is greater than.
  • Print the significance_level(s) at which the null hypothesis would be rejected.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Plot a histogram of the employees' "Years of Employment"
____.plot(kind="____")
plt.show()

# Conduct an Anderson-Darling test using the years of employment from salary_df
result = stats.____(____)

# Print which critical values the test statistic is greater than the critical values
print(result.____ > result.____)

# Print the significance levels at which the null hypothesis is rejected
print(result.____[result.____ > result.____])
Modifier et exécuter le code