Get startedGet started for free

The if-else structure

The math_por data frame now contains - in addition to the background variables used for joining por and math - two possibly different answers to the same questions for each student. To fix this, you'll use programming to combine these 'duplicated' answers by either:

  • taking the rounded average (if the two variables are numeric)
  • simply choosing the first answer (else).

You'll do this by using a combination of a for-loop and an if-else structure.

The if() function takes a single logical condition as an argument and performs an action only if that condition is true. if can then be combined with else, which handles the cases where the condition is false.

if(condition) {
   do something
} else {
   do something else
}

This exercise is part of the course

Helsinki Open Data Science

View Course

Exercise instructions

  • Please note! Here in DataCamp, executing a command over multiple lines is best done by selecting (painting) the lines with a mouse first and then hitting Ctrl+Enter normally.
  • Print out the column names of math_por
  • Adjust the code: Create the data frame alc by selecting only the columns in math_por which were used for joining the two questionnaires. The names of those columns are available in the join_by object.
  • Create the object notjoined_columns and print it out.
  • Execute the for-loop (don't mind the "change me!").
  • Take a glimpse() at the alc data frame. The factor type variables should look strange at this point.
  • Adjust the code inside the for-loop: if the first of the two selected columns is not numeric, add the first column to the alc data frame.
  • Execute the modified for-loop and glimpse() at the new data again.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# dplyr, math_por, join_by are available

# print out the column names of 'math_por'


# create a new data frame with only the joined columns
alc <- select(math_por, one_of("change me!"))

# the columns in the datasets which were not used for joining the data
notjoined_columns <- colnames(math)[!colnames(math) %in% join_by]

# print out the columns not used for joining


# for every column name not used for joining...
for(column_name in notjoined_columns) {
  # select two columns from 'math_por' with the same original name
  two_columns <- select(math_por, starts_with(column_name))
  # select the first column vector of those two columns
  first_column <- select(two_columns, 1)[[1]]

  # if that first column vector is numeric...
  if(is.numeric(first_column)) {
    # take a rounded average of each row of the two columns and
    # add the resulting vector to the alc data frame
    alc[column_name] <- round(rowMeans(two_columns))
  } else { # else if it's not numeric...
    # add the first column vector to the alc data frame
    alc[column_name] <- "change me!"
  }
}

# glimpse at the new combined data

Edit and Run Code