A possibly() version of read_lines()
We are still working with the series of URLs you were given to scrape. We are trying several methods to identify URLs that can't be accessed. Why are we doing that? Because the first step of web scraping is analyzing if you can access the URL or not. This is what the code we are writing will be useful for.
In the previous exercise, we wrapped the read_lines()
function inside a safely()
function. In this exercise, we will use the possibly()
function.
In web terminology, a 404 indicates that a web page is not available. This number will be used as the otherwise
argument.
Also, as the read_lines()
returns a vector of length n
when reading a webpage, we'll collapse paste these using the paste()
function.
The urls
vector has been provided for you.
This exercise is part of the course
Intermediate Functional Programming with purrr
Exercise instructions
Wrap the
read_lines()
function in apossibly()
call that would otherwise return 404.Map this newly created function on the URL list, and pipe it straight into
set_names()
Turn each element of this list into a length one character by using the
paste()
function, with thecollapse
argument set to" "
.Keep only the elements which are equal to 404.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a possibly() version of read_lines()
possible_read <- ___(___, otherwise = ___)
# Map this function on urls, pipe it into set_names()
res <- map(urls, ___) %>% ___(urls)
# Paste each element of the list
res_pasted <- ___(res, ___, collapse = ___)
# Keep only the elements which are equal to 404
___(res_pasted, ___)