Get startedGet started for free

Dealing with not available (NA) values

In R, NA stands for not available, which means that the data point is missing. If a variable you wish to analyse contains missing values, there are usually two main options:

  • Remove the observations with missing values
  • Replace the missing values with actual values using an imputation technique.

We will use the first option, which is the simplest solution.

This exercise is part of the course

Helsinki Open Data Science

View Course

Exercise instructions

  • Create a smaller version of the human data by selecting the variables defined in keep
  • Use complete.cases() on human to print out a logical "completeness indicator" vector
  • Adjust the code: Define comp as the completeness indicator and print out the resulting data frame. When is the indicator FALSE and when is it TRUE? (hint: ?complete.cases()).
  • filter() out all the rows with any NA values. Right now, TRUE is recycled so that nothing is filtered out.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# human with modified GNI and dplyr are available

# columns to keep
keep <- c("Country", "Edu2.FM", "Labo.FM", "Life.Exp", "Edu.Exp", "GNI", "Mat.Mor", "Ado.Birth", "Parli.F")

# select the 'keep' columns
human <- select(human, one_of(keep))

# print out a completeness indicator of the 'human' data
complete.cases(human)

# print out the data along with a completeness indicator as the last column
data.frame(human[-1], comp = "change me!")

# filter out all rows with NA values
human_ <- filter(human, TRUE)
Edit and Run Code