Extracting the first digit I
To address the question of voter fraud, begin by creating a new column of data containing the first digit of the total number of votes cast. For this, you'll need a custom function which we've created for you called get_first()
. The core of this function is substr()
, which will take a string and extract a section of it called a substring.
Once you create a new variable containing only the first digit, you can get a sense of how close it follows Benford's Law by constructing a bar plot.
This is a part of the course
“Inference for Categorical Data in R”
Exercise instructions
- Take a look at how
get_first()
works by simply typing the name of the function (with no parentheses). All it does is fiddle with the output fromsubstr()
so that it's a factor. - Mutate a new column in the
iran
data frame calledfirst_digit
that contains the first digit of city by city total votes cast. - Check to see that
get_first()
worked. From theiran
data, select off the columnstotal_votes_cast
andfirst_digit
and print them to the screen. - Construct a bar plot to visualize the distribution of the first digit.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Print get_first
get_first
# Create first_digit variable
iran <- iran %>%
___
# Check if get_first worked
___ %>%
___
# Construct bar plot
___ +
# Add bar layer
___