Get Started

Extracting the first digit I

To address the question of voter fraud, begin by creating a new column of data containing the first digit of the total number of votes cast. For this, you'll need a custom function which we've created for you called get_first(). The core of this function is substr(), which will take a string and extract a section of it called a substring.

Once you create a new variable containing only the first digit, you can get a sense of how close it follows Benford's Law by constructing a bar plot.

This is a part of the course

“Inference for Categorical Data in R”

View Course

Exercise instructions

  • Take a look at how get_first() works by simply typing the name of the function (with no parentheses). All it does is fiddle with the output from substr() so that it's a factor.
  • Mutate a new column in the iran data frame called first_digit that contains the first digit of city by city total votes cast.
  • Check to see that get_first() worked. From the iran data, select off the columns total_votes_cast and first_digit and print them to the screen.
  • Construct a bar plot to visualize the distribution of the first digit.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Print get_first
get_first

# Create first_digit variable
iran <- iran %>%
  ___
  
# Check if get_first worked
___ %>%
  ___

# Construct bar plot
___ +
  # Add bar layer
  ___
Edit and Run Code

This exercise is part of the course

Inference for Categorical Data in R

AdvancedSkill Level
4.3+
3 reviews

In this course you'll learn how to leverage statistical techniques for working with categorical data.

The course wraps up with two case studies using election data. Here, you'll learn how to use a Chi-squared test to check goodness-of-fit. You'll study election results from Iran and Iowa and test if Benford's law applies to these datasets.

Exercise 1: Case study: election fraudExercise 2: Getting to know the Iran dataExercise 3: Who won?Exercise 4: Breaking it down by provinceExercise 5: Extracting the first digit I
Exercise 6: Goodness of fitExercise 7: Goodness of fit testExercise 8: A p-value, two waysExercise 9: Is this evidence of fraud?Exercise 10: And now to USExercise 11: Getting to know the Iowa dataExercise 12: Extracting the first digit IIExercise 13: Testing IowaExercise 14: Fraud in Iowa?Exercise 15: Election fraud in Iran and Iowa: debrief

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free