LoslegenKostenlos loslegen

Testing for normality

A powerful suite of statistical tools, which includes several common hypothesis tests, depends on the assumption that the underlying data is normally distributed. While a histogram can hint at whether the data is approximately normally distributed, various hypothesis tests allow us to test this assumption directly. Moreover, histograms can be very sensitive to the number of bins, especially when the sample sizes are small.

In this exercise you'll work with salary data from employees for the City of Austin in salary_df. In particular you will be working with Hispanic firefighters. You'll analyze if their years of employment are approximately normally distributed using the Anderson-Darling hypothesis test.

Diese Übung ist Teil des Kurses

Foundations of Inference in Python

Kurs anzeigen

Anleitung zur Übung

  • Plot a histogram showing the Years of Employment for the employees.
  • Conduct an Anderson-Darling test for normality to see if Years of Employment is approximately normally distributed.
  • Find which critical_values the test statistic is greater than.
  • Print the significance_level(s) at which the null hypothesis would be rejected.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Plot a histogram of the employees' "Years of Employment"
____.plot(kind="____")
plt.show()

# Conduct an Anderson-Darling test using the years of employment from salary_df
result = stats.____(____)

# Print which critical values the test statistic is greater than the critical values
print(result.____ > result.____)

# Print the significance levels at which the null hypothesis is rejected
print(result.____[result.____ > result.____])
Code bearbeiten und ausführen