1. Learn
  2. /
  3. Courses
  4. /
  5. Inference for Categorical Data in R

Exercise

Goodness of fit test

The null hypothesis in a goodness of fit test is a list of specific parameter values for each proportion. In your analysis, the equivalent hypothesis is that Benford's Law applies to the distribution of first digits of total vote counts at the city level. You could write this as:

$$ H_0: p_1 = .30, p_2 = .18, \ldots, p_9 = .05 $$

Where \(p_1\) is the height of the first bar in the Benford's bar plot. The alternate hypothesis is that at least one of these proportions is different; that the first digit distribution doesn't follow Benford's Law.

In this exercise, you'll use simulation to build up the null distribution of the sorts of chi-squared statistics that you'd observe if in fact these counts did follow Benford's Law.

Instructions

100 XP
  • Inspect p_benford by printing it to the screen.
  • Starting with iran, compute the chi-squared statistic by using chisq_stat. Note that you must specify the variable in the data frame that will serve as your response as well as the vector of probabilities that you wish to compare them to.
  • Construct a null distribution with 500 samples of the Chisq statistic via simulation under the point null hypothesis that the vector of proportions p is p_benford. Save the resulting statistics as null.