Goodness of fit test
The null hypothesis in a goodness of fit test is a list of specific parameter values for each proportion. In your analysis, the equivalent hypothesis is that Benford's Law applies to the distribution of first digits of total vote counts at the city level. You could write this as:
$$ H_0: p_1 = .30, p_2 = .18, \ldots, p_9 = .05 $$
Where \(p_1\) is the height of the first bar in the Benford's bar plot. The alternate hypothesis is that at least one of these proportions is different; that the first digit distribution doesn't follow Benford's Law.
In this exercise, you'll use simulation to build up the null distribution of the sorts of chi-squared statistics that you'd observe if in fact these counts did follow Benford's Law.
This exercise is part of the course
Inference for Categorical Data in R
Exercise instructions
- Inspect
p_benford
by printing it to the screen. - Starting with
iran
, compute the chi-squared statistic by usingchisq_stat
. Note that you must specify the variable in the data frame that will serve as your response as well as the vector of probabilities that you wish to compare them to. - Construct a null distribution with 500 samples of the
Chisq
statistic via simulation under thepoint
null hypothesis that the vector of proportionsp
isp_benford
. Save the resulting statistics asnull
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Inspect p_benford
p_benford
# Compute observed stat
chi_obs_stat <- ___
chisq_stat(response = ___, p = ___)
# Form null distribution
null <- ___
# Specify the response
___
# Set up the null hypothesis
___
# Generate 500 reps
___
# Calculate statistics
___