Extracting the first digit I
To address the question of voter fraud, begin by creating a new column of data containing the first digit of the total number of votes cast. For this, you'll need a custom function which we've created for you called get_first()
. The core of this function is substr()
, which will take a string and extract a section of it called a substring.
Once you create a new variable containing only the first digit, you can get a sense of how close it follows Benford's Law by constructing a bar plot.
This is a part of the course
“Inference for Categorical Data in R”
Exercise instructions
- Take a look at how
get_first()
works by simply typing the name of the function (with no parentheses). All it does is fiddle with the output fromsubstr()
so that it's a factor. - Mutate a new column in the
iran
data frame calledfirst_digit
that contains the first digit of city by city total votes cast. - Check to see that
get_first()
worked. From theiran
data, select off the columnstotal_votes_cast
andfirst_digit
and print them to the screen. - Construct a bar plot to visualize the distribution of the first digit.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Print get_first
get_first
# Create first_digit variable
iran <- iran %>%
___
# Check if get_first worked
___ %>%
___
# Construct bar plot
___ +
# Add bar layer
___
This exercise is part of the course
Inference for Categorical Data in R
In this course you'll learn how to leverage statistical techniques for working with categorical data.
The course wraps up with two case studies using election data. Here, you'll learn how to use a Chi-squared test to check goodness-of-fit. You'll study election results from Iran and Iowa and test if Benford's law applies to these datasets.
Exercise 1: Case study: election fraudExercise 2: Getting to know the Iran dataExercise 3: Who won?Exercise 4: Breaking it down by provinceExercise 5: Extracting the first digit IExercise 6: Goodness of fitExercise 7: Goodness of fit testExercise 8: A p-value, two waysExercise 9: Is this evidence of fraud?Exercise 10: And now to USExercise 11: Getting to know the Iowa dataExercise 12: Extracting the first digit IIExercise 13: Testing IowaExercise 14: Fraud in Iowa?Exercise 15: Election fraud in Iran and Iowa: debriefWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.