Get Started

Extracting the first digit I

To address the question of voter fraud, begin by creating a new column of data containing the first digit of the total number of votes cast. For this, you'll need a custom function which we've created for you called get_first(). The core of this function is substr(), which will take a string and extract a section of it called a substring.

Once you create a new variable containing only the first digit, you can get a sense of how close it follows Benford's Law by constructing a bar plot.

This is a part of the course

“Inference for Categorical Data in R”

View Course

Exercise instructions

  • Take a look at how get_first() works by simply typing the name of the function (with no parentheses). All it does is fiddle with the output from substr() so that it's a factor.
  • Mutate a new column in the iran data frame called first_digit that contains the first digit of city by city total votes cast.
  • Check to see that get_first() worked. From the iran data, select off the columns total_votes_cast and first_digit and print them to the screen.
  • Construct a bar plot to visualize the distribution of the first digit.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Print get_first
get_first

# Create first_digit variable
iran <- iran %>%
  ___
  
# Check if get_first worked
___ %>%
  ___

# Construct bar plot
___ +
  # Add bar layer
  ___
Edit and Run Code