Advanced debugging
1. Advanced debugging
In this lesson we will consider an advanced approach to dealing with errors in parallel R code.2. The default error behavior
When something in our code causes unexpected behavior, code execution stops. This makes sense since we'd want to investigate the error and make sure the results we get are reliable.3. Toy example
Suppose we have a list of three vectors, two numeric and one character. If we try to apply the square-root function to this list using lapply() we understandably get an error. But we also don't get the results for the first two numeric vectors.4. Ambiguity
This is because once there is an error, all code execution halts. Even if it happens only for one of many tasks, the computer is not sure what to make of it. Until we tell it how to behave in case of an error.5. Catching the error
So let's specify that behavior. We create a custom square root function. Now we introduce the tryCatch() function. This function takes an R expression as the first argument. With tryCatch() we can tell the computer how to behave in case of an error. To the error argument, we need to supply a function. This function takes one argument, the error itself. We have named the argument "e", but it could be something else. Here, we just return the error message.6. Catching the error
Now let's apply sqrt_custom() to our list of variables. And we get our results. Notice that the error happened anyway, we haven't discovered a way to take the square-root of alphabets! But this time, the computer knew how to behave when the error occurred. That behavior was to simply relay the error message to the output.7. Catching errors in parallel
This functionality is especially useful in parallel. An error in one of the worker processes could cause the loss of all results. tryCatch() allows us to recover the results for the tasks that did run without errors, as well as locate the error.8. The births data
For a real example, let's look at the births data from the US for a given year. We have a list of data frames, each element corresponding to a state. For each state we have the month and the number of babies born for a given birth event. We'd like to check if there is a monthly trend in total births in a given state.9. The summarizing function
We write a function that uses dplyr to summarise the total number of births for each month. Let's suppose we know that the data are prone to errors. We wrap our code in curly braces and supply it to tryCatch(). We specify that a message should be passed to the results if the error does occur.10. Parallel apply
Now let's apply this function in parallel. We set up a cluster of four cores and load dplyr in each using clusterEvalQ(). Using this cluster we apply summarise_births to each element of ls_births in a parLapply() call. And we stop the cluster once done. When we run this code we get the results for all states except Alabama, where an error occurred.11. Examine the source
If we just scan the first ten rows for the data from Alabama, we can see that the plurality column contains strings, which cause the error when aggregating. We can catch any error with tryCatch() as long as the code can be parsed. That is, there are no syntactical errors in the code.12. Future map
Because tryCatch() works at the level of a single mapping, we can use it with any package. Here is an example with future_map() from the furrr package.13. The foreach case
And similarly with foreach.14. Let's practice!
Now let's try and catch some errors!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.