Get startedGet started for free

Searching for and replacing missing values

1. Searching for and replacing missing values

You now know how to visually explore missing values. But this assumes your missings are coded as NA. This is often not the case!

2. What we are going to cover

In this chapter we introduce assumptions in missing data: how to look for hidden missing values, replacing missing value labels with NA, and checking your assumptions on missingness.

3. Searching for and replacing missing values

Imagine you are done cleaning your data. But then, just when you start to relax, you find some clearly missing values in the data: only they weren't labelled NA! Instead they are: "missing", "Not Available", and "N/A"! You made a mistake: assuming that missing values are coded as NA.

4. Understanding Chaos

In this lesson, we are going to cover how to search for unexpected missing values, and how to replace them with regular NA values, once you've found them. We will use a dataset called chaos, which contains gnarly values like plain whitespace, ".", "N/A", and "missing".

5. Searching for missing values

But before we jump in and replace our values with NA, we should get a sense of how big this missing data problem is, by searching for strange missing values. We can do this with the `miss_scan_search` function, which takes a dataframe and a "search" parameter - a list of values to search for. This returns a dataframe with two columns: "Variable" - the variables, and "n", the number of times that search appears in each variable. Here, we see that searching for "N/A" returns 1 hit for the variable, grade.

6. Searching for missing values

miss_scan_count can take multiple arguments in the search, so you can look for all the strange missing values you like! Here we see that when searching for capital "N/A" and "N slash lower case a", there are two hits for the variable, grade.

7. Replacing missing values

Once you've explored and searched for different missing values, you can replace them using the function replace_with_na. This takes a dataframe, and a named list containing the variable, and the values you want to replace with NA. For example, using our chaos dataset, we can replace the values "N/A" and "N/a" in the variable "grade" with this code here, which reads as: Use chaos, then replace with NA for the variable "grade" with the values N/A and N/a. We can see that this has replaced some of the missing values.

8. "scoped variants" of replace_with_na

The replace_with_na function can be repetitive if you need to use it across many variables, for many different values. Or, for more complex cases where you might only want to replace values less than -1, or only treat character columns. To account for these situations, naniar borrows from dplyr's scoped variants and extends replace_with_na to create three functions. replace_with_na_all which operates on all variables. replace_with_na_at which operates on a subset of selected variables replace_with_na_if which operates on a subset of variables that fulfill some condition, such as being numeric, or character. We will now go over an example use of replace_with_na_all.

9. Using scoped variants of replace_with_na

The scoped variants of replace_with_na follow a specific syntax. You provide a condition argument, and pass it a special function that starts with the squiggly line, tilde ~, and when referring to a variable, you use dot-x. For example, if we want to replace all cases of -99 in a dataset, we use replace_with_na_all, and write: "chaos THEN replace_with_na_all, where the variable is equal to -99.

10. Using scoped variants of replace_with_na

To replace all values "N/A", "missing", or "na" with NA, we would write data THEN replace_with_na_all where variables are in "N/A", "missing", or "na". We can see how these give us more control over the variables that we affect.

11. Let's practice!

Now lets explore how to search for and replace missing values.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.