Dealing with not available (NA) values
In R, NA stands for not available, which means that the data point is missing. If a variable you wish to analyse contains missing values, there are usually two main options:
- Remove the observations with missing values
- Replace the missing values with actual values using an imputation technique.
We will use the first option, which is the simplest solution.
Este ejercicio forma parte del curso
Helsinki Open Data Science
Instrucciones del ejercicio
- Create a smaller version of the human data by selecting the variables defined in
keep - Use complete.cases() on human to print out a logical "completeness indicator" vector
- Adjust the code: Define
compas the completeness indicator and print out the resulting data frame. When is the indicatorFALSEand when is itTRUE? (hint:?complete.cases()). filter()out all the rows with anyNAvalues. Right now,TRUEis recycled so that nothing is filtered out.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# human with modified GNI and dplyr are available
# columns to keep
keep <- c("Country", "Edu2.FM", "Labo.FM", "Life.Exp", "Edu.Exp", "GNI", "Mat.Mor", "Ado.Birth", "Parli.F")
# select the 'keep' columns
human <- select(human, one_of(keep))
# print out a completeness indicator of the 'human' data
complete.cases(human)
# print out the data along with a completeness indicator as the last column
data.frame(human[-1], comp = "change me!")
# filter out all rows with NA values
human_ <- filter(human, TRUE)