Dealing with not available (NA) values
In R, NA stands for not available, which means that the data point is missing. If a variable you wish to analyse contains missing values, there are usually two main options:
- Remove the observations with missing values
- Replace the missing values with actual values using an imputation technique.
We will use the first option, which is the simplest solution.
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Create a smaller version of the human data by selecting the variables defined in
keep
- Use complete.cases() on human to print out a logical "completeness indicator" vector
- Adjust the code: Define
comp
as the completeness indicator and print out the resulting data frame. When is the indicatorFALSE
and when is itTRUE
? (hint:?complete.cases()
). filter()
out all the rows with anyNA
values. Right now,TRUE
is recycled so that nothing is filtered out.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# human with modified GNI and dplyr are available
# columns to keep
keep <- c("Country", "Edu2.FM", "Labo.FM", "Life.Exp", "Edu.Exp", "GNI", "Mat.Mor", "Ado.Birth", "Parli.F")
# select the 'keep' columns
human <- select(human, one_of(keep))
# print out a completeness indicator of the 'human' data
complete.cases(human)
# print out the data along with a completeness indicator as the last column
data.frame(human[-1], comp = "change me!")
# filter out all rows with NA values
human_ <- filter(human, TRUE)