The if-else structure
The math_por
data frame now contains - in addition to the background variables used for joining por
and math
- two possibly different answers to the same questions for each student. To fix this, you'll use programming to combine these 'duplicated' answers by either:
- taking the rounded average (if the two variables are numeric)
- simply choosing the first answer (else).
You'll do this by using a combination of a for
-loop and an if
-else
structure.
The if()
function takes a single logical condition as an argument and performs an action only if that condition is true. if
can then be combined with else
, which handles the cases where the condition is false.
if(condition) {
do something
} else {
do something else
}
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Please note! Here in DataCamp, executing a command over multiple lines is best done by selecting (painting) the lines with a mouse first and then hitting
Ctrl+Enter
normally. - Print out the column names of
math_por
- Adjust the code: Create the data frame
alc
by selecting only the columns inmath_por
which were used for joining the two questionnaires. The names of those columns are available in thejoin_by
object. - Create the object
notjoined_columns
and print it out. - Execute the
for
-loop (don't mind the "change me!"). - Take a
glimpse()
at thealc
data frame. The factor type variables should look strange at this point. - Adjust the code inside the
for
-loop: if the first of the two selected columns is not numeric, add the first column to thealc
data frame. - Execute the modified
for
-loop andglimpse()
at the new data again.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# dplyr, math_por, join_by are available
# print out the column names of 'math_por'
# create a new data frame with only the joined columns
alc <- select(math_por, one_of("change me!"))
# the columns in the datasets which were not used for joining the data
notjoined_columns <- colnames(math)[!colnames(math) %in% join_by]
# print out the columns not used for joining
# for every column name not used for joining...
for(column_name in notjoined_columns) {
# select two columns from 'math_por' with the same original name
two_columns <- select(math_por, starts_with(column_name))
# select the first column vector of those two columns
first_column <- select(two_columns, 1)[[1]]
# if that first column vector is numeric...
if(is.numeric(first_column)) {
# take a rounded average of each row of the two columns and
# add the resulting vector to the alc data frame
alc[column_name] <- round(rowMeans(two_columns))
} else { # else if it's not numeric...
# add the first column vector to the alc data frame
alc[column_name] <- "change me!"
}
}
# glimpse at the new combined data