The if-else structure
The math_por data frame now contains - in addition to the background variables used for joining por and math - two possibly different answers to the same questions for each student. To fix this, you'll use programming to combine these 'duplicated' answers by either:
- taking the rounded average (if the two variables are numeric)
- simply choosing the first answer (else).
You'll do this by using a combination of a for-loop and an if-else structure.
The if() function takes a single logical condition as an argument and performs an action only if that condition is true. if can then be combined with else, which handles the cases where the condition is false.
if(condition) {
do something
} else {
do something else
}
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Please note! Here in DataCamp, executing a command over multiple lines is best done by selecting (painting) the lines with a mouse first and then hitting
Ctrl+Enternormally. - Print out the column names of
math_por - Adjust the code: Create the data frame
alcby selecting only the columns inmath_porwhich were used for joining the two questionnaires. The names of those columns are available in thejoin_byobject. - Create the object
notjoined_columnsand print it out. - Execute the
for-loop (don't mind the "change me!"). - Take a
glimpse()at thealcdata frame. The factor type variables should look strange at this point. - Adjust the code inside the
for-loop: if the first of the two selected columns is not numeric, add the first column to thealcdata frame. - Execute the modified
for-loop andglimpse()at the new data again.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# dplyr, math_por, join_by are available
# print out the column names of 'math_por'
# create a new data frame with only the joined columns
alc <- select(math_por, one_of("change me!"))
# the columns in the datasets which were not used for joining the data
notjoined_columns <- colnames(math)[!colnames(math) %in% join_by]
# print out the columns not used for joining
# for every column name not used for joining...
for(column_name in notjoined_columns) {
# select two columns from 'math_por' with the same original name
two_columns <- select(math_por, starts_with(column_name))
# select the first column vector of those two columns
first_column <- select(two_columns, 1)[[1]]
# if that first column vector is numeric...
if(is.numeric(first_column)) {
# take a rounded average of each row of the two columns and
# add the resulting vector to the alc data frame
alc[column_name] <- round(rowMeans(two_columns))
} else { # else if it's not numeric...
# add the first column vector to the alc data frame
alc[column_name] <- "change me!"
}
}
# glimpse at the new combined data